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Chapter  1 


INTRODUCTION 

The  Cramer- Rao  bound  (CRB),  the  inverse  of  the  Fisher  information,  is  a 
limit  on  the  performance  of  the  estimation  of  parameters  under  certain  conditions. 
Hence,  the  variance  of  any  unbiased  estimator  cannot  be  lower  than  this  bound, 
and  the  CRB  may  also  be  interpreted  in  some  sense  as  a  measure  of  performance 
potential.  For  a  large  number  of  scenarios,  it  is  often  of  interest  to  gauge  the 
performance  of  estimation  under  parametric  constraints.  The  traditional  approach 
in  deriving  CRBs  for  these  cases  is  to  find  a  reparameterization  that  represents 
the  constraint.  However,  such  an  approach  is  not  always  feasible  for  a  large  class 
of  constraints  and  numerical  approximations  are  often  restricted  to  the  particular 
model.  An  alternative  and  equivalent  option  to  reparameterization,  the  constrained 
Cramer- Rao  bound,  is  presented  herein,  which  is  also  computationally  simple  to 
program. 

In  communications  design  research,  estimation  performance  metrics  are  often 
interpreted  to  represent  performance  potential  to  study  the  feasibility  of  a  model  to 
meet  a  certain  measure  of  desired  reliability.  This  approach  is  often  practical  since 
it  avoids  the  necessity  of  searching  for  the  best  performer  over  a  class  of  estimators 
for  a  particular  trial  model  and  since  the  CRB  is  analytically  or  numerically  simple 
to  compute.  The  downside  to  this  approach  is  that  for  each  model,  the  CRB  needs 
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to  be  derived  each  time.  This  effectively  prohibits  the  practitioner  from  studying 
an  overly  large  class  of  models  or,  in  the  context  of  this  work,  a  large  class  of 
constraints  on  some  base  model.  This  restriction  is  to  a  great  extent  eliminated 
using  the  constrained  Cramer-Rao  bound,  as  the  Fisher  information,  which  involves 
an  integration  over  many  variables,  for  the  base  model  only  needs  to  be  evaluated 
once. 

Chapter  2  offers  a  quick  review  of  several  connections  with  the  Cramer-Rao 
bound  (CRB)  within  mathematical  statistics  and  serves  as  a  reference  point  for  the 
study  of  parameters  under  parametric  equality  constraints,  discussed  in  Chapter  3. 
With  the  possible  exception  of  the  identifiability  relationship  in  section  2.2,  much 
of  this  section  is  familiar  and  well  represented  in  standard  mathematical  statistics 
texts. 

In  Chapter  3,  a  general  theory  of  the  constrained  Cramer-Rao  bound  (CCRB) 
is  presented.  In  section  3.1,  the  CCRB  is  defined  and  proven,  alternative  formu¬ 
las  are  presented,  and  several  interesting  properties  of  the  bound  are  detailed.  In 
section  3.2,  a  connection  is  made  between  the  CCRB  and  two  different  notions  of 
identifiability  under  certain  conditions.  In  section  3.3,  the  linear  model  with  linear 
constraints  is  examined  in  the  context  of  the  CCRB.  In  section  3.4,  connections 
between  the  CCRB  and  constrained  maximum  likelihood  estimation  are  detailed, 
including  an  asymptotic  normality  result  and  an  adaptation  of  the  method  of  scoring 
to  the  constrained  parameter  scenario.  This  chapter  concludes  with  the  consider¬ 
ation  of  hypothesis  testing  under  constraints  in  section  3.5.  Chapters  2  and  3  are 
designed  so  that  the  section  numbers  correspond  directly,  i.e.,  section  3.x  relates 
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a  concept  for  constrained  parameters  that  section  2.x  reviews  for  unconstrained 


parameters. 

In  Chapter  4,  the  analytic  tools  developed  in  Chapter  3  are  applied  in  the 
communications  context  of  the  convolutive  mixture  model  (section  4.1)  and  the 
calibrated  array  model  (section  4.2).  These  models  are  defined  and  their  Fisher 
information  matrices  developed  in  section  4.1.3  and  section  4.2.1,  respectively.  A 
variety  of  constraints  for  these  models  are  considered  in  sections  4.1.4  and  4.2.2. 

1.1  A  note  on  the  notation 

All  elements  will  be  denoted  in  lowercase  math  font:  a.  All  vectors  will  be 
column  vectors  and  be  denoted  in  a  lowercase  bold  math  font:  a.  Hence,  the  ith 
element  of  the  column  vector  a  will  be  denoted  as  at.  (This  should  not  be  confused 
with  a*,  which  is  often  used  as  a  subvector  of  the  vector  a  and  will  be  defined 
in  context.)  All  matrices  will  be  denoted  in  an  uppercase  bold  math  font:  A. 
All  scalars,  vectors,  and  matrices  are  assumed  to  have  elements  with  real-valued 
numbers  unless  otherwise  noted  as  complex-valued  (where  the  complex  number  i  = 
j  =  a/— T  should  be  clear  from  context). 

For  vectors  and  matrices,  (• )T  will  denote  the  transpose  operator  (do  not  with 
confuse  (•)',  which  is  occasionally  used  here  as  a  dummy  variable),  (•)*  will  denote  the 
conjugate  operator  (do  not  confuse  with  (•)*,  which  is  occasionally  used  as  a  variant 
of  another  vector  or  matrix),  and  (-)H  will  denote  the  Hermitian  (or  conjugate 
transpose)  operator.  When  a  vector  depends  on  another  vector  value  as  in  a(G ),  then 
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the  Jacobian  will  be  a  matrix  denoted  as  A{0)  where  the  ith  row  is  the  transposed 
of  the  vector  and  Oj(0)  is  the  ith  row  element  of  a(0).  For  square  matrices, 

(•)_1  will  denote  the  inverse  of  the  matrix  and  (•)’’'  will  denote  the  pseudoinverse  of 
the  matrix.  Of  course,  (-)2  will  denote  the  square  of  the  matrix. 

For  symmetric  matrices  (or  Hermitian  matrices  in  the  complex-valued  case,  the 
expression  A  >  B  will  denote  that  the  matrix  A  —  B  is  positive  definite.  Similarly, 
A  >  B  will  denote  that  the  matrix  A  —  B  is  positive  semidefmite. 

Sets  will  be  denoted  in  an  uppercase  blackboard  math  font:  A  or  in  an  up¬ 
percase  Greek  letter  math  font:  O.  Also,  all  sets  will  be  assumed  open  sets  unless 
otherwise  noted. 

For  convenience  to  allow  the  reader  to  find  referenced  items,  numbered  theo¬ 
rems,  corollaries,  and  examples  share  numbering.  Thus,  the  theorem  immediately 
following  Example  x.4  in  Chapter  x  is  numbered  Theorem  x.5  even  though  the 
previous  theorem  is  Theorem  x.l. 
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Chapter  2 

THE  CRAMER-RAO  BOUND 

The  Cramer-Rao  bound  (CRB)  is  a  lower  bound  on  the  error  covariance  of 
any  unbiased  estimator  under  certain  regularity  conditions.  As  such,  it  is  a  measure 
of  the  optimal  performance  of  an  unbiased  estimator  under  a  given  model.  There 
are  other  mean-square  error  lower  bounds,  such  as  the  Ziv-Zakai  bound  [78],  the 
Hammersley-Chapman  Robbins  bound  [26,  16],  the  Barankin  bound  [8],  the  Bhat- 
tacharyya  bound  [11],  etc.,  and  indeed,  depending  on  the  model,  there  are  numerous 
other  possible  performance  measures,  such  as  classification  bounds  like  bit-error  rate 
(BER)  or  symbol-error  rate  (SER),  but  the  CRB  remains  a  very  popular  benchmark 
due  to  it’s  computational  simplicity  and  its  underlying  well-developed  theory. 

This  theory  has  led  to  numerous  connections  in  areas  of  mathematical  statis¬ 
tics,  e.g.,  identifiability,  linear  models,  maximum  likelihood,  including  asymptotic 
normality  and  the  method  of  scoring,  and  hypothesis  testing.  This  short  chapter  is 
a  quick  review  of  just  a  few  of  these  connections.  A  more  complete  discussion  of  the 
utility  and  application  of  the  topics  discussed  in  this  chapter,  as  well  as  proofs  of  the 
definitions,  theorems  and  statements  herein,  may  be  found  in  many  standard  sta¬ 
tistical  inference  texts,  such  as  Shao  [62]  or  Casella  and  Berger  [14]  for  statisticians, 
Kay  [36]  for  signal  processors,  or  Van  Trees  [72]  for  engineers. 
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2.1  Definition 


Suppose  we  have  an  observation  x  in  X  C  Mn  from  a  probability  density 
function  (pdf)  p(x;  6)  where  0  is  in  an  open  set  0  C  Mm  is  a  vector  of  deterministic 
parameters.  The  Fisher  information  matrix  (FIM)  of  this  model  is  given  by 

I(d)  =  Ee{s(x;e)sT(x;6)} 

where  s(x;  6)  is  the  Fisher  score  defined  by 

oV  e'=e 

and  the  expectation  is  evaluated  at  6,  i.e. ,  Eg{-)  =  /  (• )p(x;  Q)dx.  And  suppose  as 

Jx 

regularity  conditions,  the  pdf  is  differentiable  with  respect  to  6  and  satisfies 

■^Ee{h{x))  =  Ee{h(x)sT(x ;  6))  (2.1) 

for  h(x)  =  1  and  h(x)  =  t(x)  where  t{x )  is  an  unbiased  estimator  of  6  [62],  These 
conditions  are  assured  under  a  number  of  scenarios,  for  example,  when  the  Jacobian 
and  Hessian  of  the  density  function  p{x ;  6)  is  absolutely  integrable  with  respect  to 
both  x  and  6  [72] ,  and  essentially  permit  switching  between  the  order  of  integration 
and  differentiation.  Under  these  assumptions,  we  have  the  following  information 
inequality  theorem  [14,  62,  36,  72],  independently  developed  by  Cramer  [17]  and 
Rao  [56]. 

Theorem  2.1.  The  Cramer-Rao  bound  is  the  inverse  of  the  FIM, 

CRB(0)  =  /_1(6>),  (2.2) 
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if  it  exists,  and  the  variance  of  any  unbiased  estimator  t(x)  satisfies  the  inequality 


Var(t(*))  >  CRB(0) 

with  equality  if  and  only  if  t(x)  —  6  =  CRB(0)s(ay  9)  in  the  mean-square  sense. 

Example  2.2.  Let  x  ~  CA/"(d,  cr2)  with  unknown  complex-valued  mean  {)  and 
known  variance  cr2.  In  terms  of  real- valued  parameters,  the  equivalent  model  is 


Re(cc) 

Im(cc) 


AT( 


Re(d) 

Im(-d) 


cr 

’  Y 


^2x2 


From  a  well-known  result  on  normal  distributions  [36,  equation  (3.31)], 


Re(d) 

Im(-d) 


cr 


l2x2 


and  the  CRB  is  ^/2X2- 


2.1.1  Extensions 

The  performance  of  a  function  of  parameters,  e.g.  the  transformation  a  = 
k(0),  is  often  of  more  interest  than  the  performance  of  the  parameters.  If  the 


Jacobian  of  the  transformation  function  is  K(0)  =  — — 


de'T 


the  performance  of  an  unbiased  estimator  of  a  is  [62,  36] 


01  =9 


,  then  the  CRB  on 


CRB (ck)  =  K(6)I  (d)K  (0), 


i.e.,  if  S(x)  is  an  unbiased  estimator  of  ck,  then  Var(iS(a;))  >  K(6)CRB(d)KT(6). 
Implicit  in  this  inequality  for  the  transformation  is  that  a  is  differentiable  with 
respect  to  9  and  (2.1)  must  also  be  satisfied  for  h{x)  =  S(x).  Consequently,  if 
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an  estimator  t(x)  is  biased  with  bias  6(0)  =  Egt(x)  -  0,  then  the  transformation 
formula  above  can  be  used  to  attain  a  bound  for  a  =  0  +  6(0).  Then  Va r(t(x))  > 
CRB(0  +  6(0))  where 

CRB(0  +  6(0))  =  (Im  +  B(0))  CRB(0)  (lm  +  BT(G )) 
with  B(0)  —  ^\gl=0 

Theorem  2.1  requires  a  nonsingular  Fisher  information,  however,  there  are  a 
number  of  interesting  cases  where  this  requirement  can  not  be  met  yet  the  model  is 
still  of  interest.  For  this  scenario,  the  pseudoinverse  of  the  FIM  is  occasionally  used 
as  a  bound  in  place  of  the  CRB,  i.e. , 

Var(£(aj))  >  l\0) 

for  an  unbiased  estimator  t(x).  This  bound  inequality  is  trivial  for  nonidentifi- 
able  functions  of  the  parameters  [66],  i.e.,  the  variance  is  finite  only  if  H{0)  = 
H(0)I(0)/t(0)  where  H{0)  =  K(0)  +  B{G)  and  t(x)  is  a  biased  estimator  of 
a.  =  k{0)  with  bias  6(0). 

2.2  Identifiability 

The  ability  to  identify  parameters  determines  the  validity  and  utility  of  cer¬ 
tain  structural  models.  Criteria  on  the  identifiability  of  parameters  has  numerous 
connections  to  parametric  statistical  measures,  such  as  Kullback-Leibler  distance 
[13]  and  the  Fisher  information  matrix  [58,  29,  69].  In  this  section,  two  of  these  con¬ 
nections  are  developed  to  establish  conditions  under  which  a  particular  parametric 


model  is  identifiable. 


2.2.1  Local  identifiability 


To  proceed  in  examining  the  identifiability  criterion  from  the  CRB,  a  definition 
of  identifiability  is  required.  A  parameter  G  E  ©  C  Mm  is  identifiable  in  the  model 
p(x ;  •)  if  there  is  no  other  other  0  6  0  such  that  p(x\  G)  =  p(x ;  G)  for  all  x  €  Mn. 
A  parameter  is  locally  identifiable  if  there  exists  an  open  neighborhood  of  6  such 
that  G  is  identifiable  in  that  neighborhood. 

A  parameter  is  (locally)  identifiable  in  the  additive  noiseless  case  if  the  param¬ 
eter  is  solvable  (locally).  Estimable  parameters,  i.e.,  expected  values  of  functions 
of  the  observations  [57],  are  also  identifiable.1  Hence,  a  non-identifiable  parameter 
is  not  estimable  regardless  of  the  scheme  or  the  number  of  observations.  These 
scenarios  exist  when  some  inherent  ambiguity  exists  in  the  model. 

Example  2.3.  The  parameter  vector  G  =  [6\ ,  fb]1  in  the  model 

x  =  Oi  +  02  +  w, 

where  w  represents  the  observation  noise,  is  not  identifiable  since  it  is  indistinguish¬ 
able  from  the  parameter  vector  G  =  [6\  +  a,6 2  —  a]T  for  any  real  number  a. 

The  Fisher  information  matrix  is  called  regular  if  1(G)  is  full  rank,  and  G  is 
said  to  be  a  regular  point  of  1(G).  If  the  FIM  is  singular,  G  is  a  singular  point. 

Example  2.4.  The  FIM  for  the  model 

x  =  92  +  w, 

1The  converse  is  not  true.  While  it  is  possible  to  develop  estimation  schemes  for  any  identifiable 
parameters,  there  is  no  guarantee  that  those  estimators  will  be  unbiased. 
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where  w  rs-/  A/"(0, 1),  is  1(9)  =  AO2.  For  any  6  Y  0,  6  is  locally  identifiable  but  not 
identifiable  and  at  the  same  time  a  regular  point  of  the  FIM.  For  6  =  0,  6  is  a 
singular  point,  yet  is  identifiable. 

Fisher  information-regularity  implies  local  identifiability,  but  as  the  example 
demonstrates  the  converse  is  not  true.  Rothenberg  [58]  found  a  connection  between 
local  identifiability  and  the  FIM  under  certain  conditions. 

Theorem  2.5  (Rothenberg).  Assume  the  FIM  1(6)  has  constant  rank  locally  about 
0.  Then  0  is  locally  identifiable  if  and  only  if  1(9)  is  regular. 

2.2.2  Strong  Identifiability 

Suppose  that  p(x]  0)  is  a  normal  pdf  with  mean  p(0)  E  and  variance 
£(0),  whose  elements  may  be  explicitly  defined  by  a  map  p  :  0  — »  M9  where 
q  <  p  +  p(p  +  l)/2  and  it  is  assumed  m  <  q.  Then  by  the  given  definitions,  (local) 
identifiability  holds  when  p  is  injective  (locally)  and  since  by  a  transformation  on 
the  FIM  [12,  p.  157] 

regularity  holds  when  the  Jacobian  has  full  rank  m. 

Suppose  there  exists  a  set  of  indices  ii, . . . ,  im  e  {1, . . . ,  q}  that  make  <p*(0)  = 
[<£>^(0), . . . ,  p>im(0)Y  injective  on  0.  Then  each  0  6  0  is  strongly  identifiable  and 
ip*  is  a  representative  mapping.  By  this  definition,  if  1(0)  is  regular  at  0  then  6  is 
in  a  strongly  identifiable  open  neighborhood,  and  if  0  is  strongly  identifiable  on  0 
then  it  is  also  identifiable  on  0.  The  converses  are  not  generally  true.  The  following 
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theorem  establishes  conditions  under  which  the  converses  are  true  [29]. 

Theorem  2.6  (Hochwald  and  Nehorai).  Let  :  Q  —>  O'  be  a  holomorphic  mapping 
of  z  E  12  C  U atzA^a.,  where  0  C  hi  C  Cm  and  Qa  is  open  in  Cm  for  each  a.  Then 

(a)  if  I(z)  is  regular,  there  exists  a  strongly  identifiable  open  neighborhood  about 
z,  and 

(b)  if  there  exists  a  representative  mapping  <£*  :  Qa  — >•  Cg  for  each  a,  I(z )  is 
regular  for  every  z  G  hi. 

Therefore,  the  existence  of  a  proper  holomorphic  function(s)  equates  Fisher  in¬ 
formation  regularity  with  strong  identifiability  for  normal  distributions.  And  locally 
constant  rank  in  the  FIM  equates  regularity  with  local  identifiability  for  arbitrary 
distributions. 

2.3  Linear  Model 

Suppose  we  have  observations  x  from  a  linear  model 

x  =  HO  +  w,  (2.3) 

on  0,  where  H  is  an  observation  matrix  consisting  of  known  elements  and  w  is  the 
noise  from  the  observations  with  mean  zero  and  variance  C. 
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2.3.1  Best  Linear  Unbiased  Estimators 


The  Gauss-Markov  theorem  [14]  states  that  the  best  linear  unbiased  estimator 
(BLUE)  is  given  by  the  (weighted)  least  squares  solution 

0LS(z:)  =  (. HTC  lH)~l  HTC~'x ,  (2.4) 

so  called  for  minimizing  the  (weighted)  least  squares  ( x  —  H0)T  C (x  —  HO).  For 
any  other  LLIE  of  6,  i.e.  Ax,  then  Var (Ax)  >  Var(0Ls(a:))  with  equality  if  and 
only  if  A  =  (HtC~1H)~1  HtC~1.  This  assumes  a  full  column  rank  observation 
matrix  H.  Otherwise,  for  any  estimable  function  of  the  parameters  d1  6,  where  d 
is  in  the  column  space  of  the  transposed  observation  matrix  HT ,  its  least  squares 
solution  is  d1  6^s(x)  where 

0LS(a;)  =  (HTC~lH)]  HTC-lx,  (2.5) 

and  ( • )  '  is  the  generalized  pseudoinverse  of  (•)  [57,  theorems  11.2B,11.3A-D],  This 
solution  is  also  the  BLLIE  with  variance  dT  (H7  C_1iT)^  d. 

2.3.2  Gaussian  noise 

Additionally,  if  the  noise  has  a  Gaussian  distribution,  i.e.,  w  ~  J\f( 0,  C ),  then 
the  least  squares  solution  is  also  the  maximum  likelihood  estimator  (MLE)  and  the 
minimum  variance  unbiased  estimate  (MVUE). 

Theorem  2.7.  If  the  observations  obey  the  linear  model  in  (2.3),  where  H  is  a 
known  full  column  rank  matrix,  6  is  an  unknown  parameter  vector,  and  in  is  a 
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zero-mean  normal  random  vector  with  variance  C,  then  the  MVUE  is 

0LS(®)  =  (i^C^if)-1  J^C"1*  (2.6) 

with  estimator  covariance  equaling  the  CRB  J_1(0)  =  (H1  G_1i?) 

Similarly,  if  H  is  not  full  rank,  and  d'  6  is  an  estimable  function,  then  the 
MLE  is  dT{?MLE(zO  where  =  0Ls(a?)  from  (2.5).  This  MLE  is  also  the 

MVUE  [57,  theorems  11.3F-G]. 

2.4  Maximum  likelihood 

Given  observations  x  from  a  likelihood  (or  pdf)  p{x ;  6)  depending  on  an  un¬ 
known  parameter  6,  a  popular  method  of  estimating  the  parameter  is  the  method 
of  maximum  likelihood.  This  approach  chooses  as  an  estimator  6ml{x)  that,  if  true, 
would  have  the  highest  probability  (the  maximum  likelihood)  of  resulting  in  the 
given  observations  x ,  i.e.,  the  optimization  problem: 

0ml (a?)  =  argmaxlogp(a;;  6) 

0 

where  for  convenience  the  log-likelihood  is  equivalently  maximized  since  log(-)  is 
monotone.  An  analytic  solution  of  the  MLE  can  be  found  from  the  first-order 
conditions  on  the  log-likelihood  by  considering  solutions  0(x)  of 

s(x;  6  )  —  0.  (2.7) 

This  is  the  method  of  maximum  likelihood.  Provided  0  is  an  open  set,  0ml (a?)  will 
satisfy  (2.7). 
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2.4.1  Efficient  estimation 


If  an  efficient  estimator  exists,  it  is  well-known  that  the  method  of  maximum- 
likelihood  finds  the  estimator  [36,  exercise  7.12],  i.e.,  such  an  estimator  must  be  a 
stationary  point  of  the  maximum-likelihood  optimization  problem.  More  formerly, 
we  have  the  following  theorem. 

Theorem  2.8.  If  t(x)  is  an  estimator  of  6 ,  which  is  also  efficient  with  respect  to  the 
CRB,  then  the  estimator  is  a  stationary  point  of  the  following  optimization  problem: 

max  \ogp(x:  6). 

2.4.2  Asymptotic  Normality 

Let  the  samples  an,  an,  •  •  • ,  xn  be  iid  as  x  from  the  pdf  p(xm,  6).  Denote  yn  = 
(an,  an,  ■  •  ■ ,  xn)  to  be  the  collection  of  these  samples,  so  that  the  likelihood  will 

n 

be  p(yn\0)  =  n  p(xi]6),  with  the  maximum  likelihood  of  these  samples  denoted 

i= 1 

6(Vn)- 

Theorem  2.9.  Assuming  the  regularity  conditions  stated  earlier  on  the  pdf  p(x;  6), 
the  MLE  of  the  parameter  6  is  asymptotically  distributed  according  to 

VS  (%„)-«) 

where  1(0)  is  derived  from  the  pdf  p(x;G),  i.e.,  it  is  the  Fisher  information  of  a 
single  observation  or  sample. 
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2.4.3  Scoring 


There  exists  a  number  of  approaches  to  finding  the  maximum  likelihood,  in 
some  cases  requiring  iterative  techniques.  One  such  technique  is  Fisher’s  method 
of  scoring.  Given  an  observation  x  from  a  likelihood  or  pdf  p(x]  6)  depending  on 
an  unknown  parameter  6  and  given  some  initial  estimate  6 (1^  of  0,  then  iteratively 
using  the  update 

Q(k+1)  =  Q(k)  +  /-I (e^)s(x-,  0(fc))  (2.8) 

will  find  the  MLE  under  certain  conditions,  e.g.,  provided  the  initial  estimate  is 
sufficiently  close  in  a  locally  convex  region. 

2.5  Hypothesis  testing 

Given  the  inclusion  of  the  CRB  quantity  in  the  asymptotic  normality  results 
in  section  2.4.2,  it  is  not  surprising  that  there  would  also  exist  connections  to  some 
asymptotic  hypothesis  tests.  Assume  h  :  Mm  — >  Mr  is  a  consistent  and  nonredundant 
differentiable  function,  which  defines  the  null  hypothesis 

H0  :  h(0)  =  0 

in  the  likelihood  (or  model)  p(yn]  0)  versus  the  alternative  hypothesis  Hi  :  h(0 )  ^  0. 

2.5.1  The  Rao  statistic 

The  Rao  (or  score)  test  statistic  is  given  by 
1 

p{yn)  =  - sT(yn ;  0h(yn))I^1(0h(yn))s(yn;  0h(yn ))  (2.9) 

n 
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where  the  Fisher  score  at  yn  is  s(yn,0h(yn ))  =  £  s{xi,0h(yn))  and  0h{yn)  is  the 

i= 1 

MLE  of  0  under  the  null  hypothesis  h(6)  =  0.  A  variant  of  this  statistic,  the 
Lagrange  multiplier  test  [3,  63] 

\lH(0h(yn))I-\0h(yn))HT(0h(yn))\n, 

was  developed  by  Silvey  [63] .  The  equivalence  between  the  Rao  and  Lagrange  mul¬ 
tiplier  test  comes  from  the  first  order  condition  to  satisfy  the  constraint  [45],  i.e., 

s(yn;0h(yn))  +  HT(0h(yn))\n  =  0, 
h{0h(yn ))  =  0 

where  An  €  Mr  is  a  vector  of  Lagrange  multiplier  estimates.  First  order  Taylor-series 
expansions  of  both  equations  about  the  true  parameter  6  produces 

s(yn]  0)  -  ln(0){0h(yn)  ~  0)  +  HT (0h(yn))Xn  =  o(n“1/2) 
h{0)  +  H(0){0h{yn)^0)  +  o{n~1'2)  =  h{0h{yn)) 

where  I n ( 0 )  =  nl{0 )  (n  times  the  Fisher  information  based  on  a  single  sample  x). 
Hence  under  the  null  hypothesis,  the  latter  implies  H(0)(0h(yn )  —  0)  =  o(n-1/2), 
and  premultiplying  the  former  by  iT(0)J“1(0),  then 

H(6)I-\e)s(yn',e)  +  H(9)I-\9)HT(Bh(yn))\n  =  o(n -I/2). 

Since  s(yn;0 )  ~  A/"(0,  In(0)),  then  applying  Slutsky’s  theorem  and  the  continuity 
of  the  Fisher  information  and  the  hypothesis  function, 

H(9h(yn))I-,(9h(yn))HT(9h(yn))\n  ^  Af  (0,H(9)I-'(9)Ht(9))  . 
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/  „  „  \-l/a  ~ 

Therefore,  (  H(6h(yn))I~1(6h(yn))HT(6h(yn)) )  An  is  a  r-dimensional  standard 

normal  variable  in  distribution,  and 

p(Vn)  Xr- 

Hypothesis  Ho  is  rejected  if  p{yn )  >  y^a,  where  Xra  is  the  (1  —  a)-quantilc  of  the 
chi-square  distribution  with  r  degrees  of  freedom. 

2.5.2  The  Wald  statistic 

The  Wald  test  statistic  is  given  by 

u(yn)  =  nhT{d{yn ))  (//(0(t/n))/-1(0(2/n))iTT(0(t/n)))~1  M^W)  (2.10) 

and  0(yn )  is  an  MLE  of  0.  (The  nonredundancy  of  h 

o' =o 

implies  that  H(-)  is  full  row  rank.)  From  section  2.1.1,  the  CRB  of  h(6)  in  the 
model  p{x\  6)  is  iT(0)/_1(0)iTi  ( 6 )  and  therefore,  using  theorem  2.9, 

VS  (/»(%«))  -  />(»))  4  v  (0.  H(0)i-\e)HT(0)) . 

Hence,  under  Ho,  using  Slutsky’s  theorem,  the  convergence  in  probability  of  the 
MLE,  and  continuity  of  the  FIM  and  Jacobian  of  the  test  function, 

v(yn)  X2v 

Therefore,  the  hypothesis  Ho  is  rejected  if  u  ( Un )  >  Xr,a- 

2.6  Discussion 

In  this  section,  the  Cramer-Rao  bound  (CRB)  was  defined  and  in  theorem 
2.1  it  was  stated  to  be  a  bound  on  mean-square  error  performance  of  an  unbiased 


where  H(0)  = 


dh{6 

()6'T 
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estimator.  The  theory  of  the  CRB’s  connection  to  a  variety  of  useful  theorems 
and  equations  in  mathematical  statistics  was  demonstrated.  The  CRB  is  included 
in  conditions  for  local  identifiability  (theorem  2.5)  as  well  as  conditions  for  strict 
identifiability  (theorem  2.6).  The  CRB  is  the  projection  matrix  of  the  BLUE  under 
a  Gaussian  model  in  equation  (2.6).  There  is  a  connection  between  the  existence  of 
efficiency  with  respect  to  the  CRB  and  the  method  of  ML  (theorem  2.8).  The  CRB 
is  also  the  asymptotic  variance  of  the  ML  estimator  (theorem  2.9)  and  appears  in 
the  update  formula  in  (2.8)  for  the  method  of  scoring.  The  CRB  also  appears  in  the 
formulas  for  the  Rao  test  statistic  in  (2.9)  and  for  the  Wald  test  statistic  (2.10). 

These  connections  are  not  exhaustive,  e.g.,  the  CRB  formula  can  also  be  useful 
in  defining  confidence  regions  or  in  useful  as  a  cost  function,  but  these  are  perhaps 
the  most  prevalent  general  topics  in  mathematical  statistics  theory  and  for  that 
reason  serve  as  a  useful  comparison  for  the  constrained  Cramer- Rao  bound  in  chapter 
3  and  its  connections  in  the  theory. 
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Chapter  3 

THE  CONSTRAINED  CRAMER-RAO  BOUND 

While  the  Cramer-Rao  bound  (CRB)  is  a  useful  measure  of  parametric  esti¬ 
mation,  it  does  not  inherently  measure  the  performance  of  estimators  of  parameters 
that  satisfy  side  information  in  the  form  of  a  functional  equality  constraint 

f(0)  =  0.  (3.1) 

The  statistical  literature  is  surprisingly  somewhat  limited  in  addressing  performance 
measures  under  this  general  scenario.  The  traditional  practice  is  to  find  some  equiv¬ 
alent  reparameterization  of  the  particular  model  and  then  find  the  CRB  on  the 
parameters  of  interest  using  the  reparameterized  transformation.  This  approach, 
however,  does  not  lend  itself  to  theoretical  meaning  beyond  the  particular  reparam¬ 
eterized  model.  Typically,  works  that  do  examine  (3.1)  in  a  general  manner  are 
focused  on  developing  methods  for  decisions  (hypothesis  testing)  instead  of  mea¬ 
suring  estimation  performance.  Nevertheless,  these  results  have  connections  to  a 
CRB  incorporating  the  side  information  in  (3.1),  or  a  constrained  CRB.  A  number 
of  papers,  including  Aitchison  and  Silvey  [3]  and  Crowder  [18] ,  using  the  method  of 
Lagrangian  multipliers,  examine  the  asymptotic  normality  of  the  constrained  maxi¬ 
mum  likelihood  estimator  (CMLE)  and  as  a  consequence  unintentionally  develop  a 
CRB  under  equality  constraints.  Under  certain  conditions,  the  asymptotic  variance 
of  the  MLE  equaling  the  CRB  lends  credence  to  the  claim  that  the  asymptotic  vari- 
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ance  of  the  CMLE  should  equal  to  the  CRB  under  equality  constraints,  although  the 
authors  did  not  always  appear  cognizant  of  this  fact.  Others,  including  Silvey  [63]  to 
develop  his  Lagrange  multiplier  test,  Osborne  [55]  with  linear  constraints  to  develop 
a  scoring  algorithm,  and  Waldorp,  Huizenga,  and  Grasman  [73]  to  develop  a  Wald- 
type  test,  also  use  the  Lagrange  multiplier  approach  in  developing  a  constrained 
bound.  Again,  since  these  authors  were  primarily  focused  on  asymptotic  properties 
or  hypothesis  testing,  the  nature  and  perhaps  utility  of  this  mathematical  quantity 
in  their  work  is  not  explicitly  stated  as  a  CRB  or  bound  on  performance  estima¬ 
tion  of  parameters  under  constraints.  The  creation  of  a  constrained  bound  strictly 
for  the  use  in  performance  analysis  wasn’t  achieved  until  Gorman  and  Hero  [23]. 
Gorman  and  Hero  derived  such  a  measure  by  taking  the  limit  of  the  Hammersley- 
Chapman-Robbins  bound  with  test  points  restricted  to  exist  only  in  the  constraint 
space.  This  constrained  Cramer-Rao  bound  (CCRB) 

I_1(0)  -  I~1{G)FT(e)  (F(0)I~1(0)Ft(0))~1  F(0)I~1(0)  (3.2) 

utilizes  the  Jacobian  F(0)  of  the  functional  constraint  f(0)  and  the  inverse  of 
the  Fisher  Information  matrix  (FIM)  1(0)  (based  on  the  unconstrained  model), 
which  must  be  nonsingular.  As  with  the  CRB,  there  exist  a  number  of  alternative 
derivations.  The  works  of  Crowder,  Waldorp,  et  al,  Gorman  and  Hero,  and  Aitchison 
[2]  include  the  formula  in  (3.2)  in  some  manner  for  which  the  CRB  might  be  used  for 
the  unconstrained  scenario  in  their  works,  a  fact  that  implicitly  proves  the  validity  of 
the  CCRB.  With  the  explicit  proof  by  Gorman  and  Hero  as  a  guideline,  Marzetta  [47] 
provides  an  elementary  proof  of  this  CCRB,  which  avoids  the  use  of  the  application  of 
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the  Cauchy-Schwarz  inequality,  and  avoids  the  use  of  pseudo-inverses,  by  examining 
the  inequality  created  from  the  positive-semidefiniteness  property  for  the  variance 
of  a  properly  defined  random  variable.  A  similar  construction  was  used  by  Stoica 
and  Ng  [68]  to  formulate  a  more  general  CCRB 

U{d)(UT(0)I(9)U{d))~1UT(d)  (3.3) 

that  incorporates  the  constraint  information  without  the  assumption  of  a  nonsin¬ 
gular  FIM.  This  CCRB  utilizes  the  unconstrained  FIM  and  an  orthonormal  com¬ 
plement  matrix  U  ( 6 )  whose  vectors  span  the  null  space  of  the  constraint  Jacobian 
matrix.  Furthermore,  when  the  FIM  is  nonsingular,  the  Stoica-Ng  CCRB  in  (3.3)  is 
equivalent  to  the  Gorman-Hero  version  of  (3.2).  Hence,  while  much  of  the  previous 
work  used  the  formula  in  (3.2),  the  more  general  formula  (3.3)  is  also  applicable. 
Osborne,  independently  from  much  of  these  other  works,  developed  a  method  of 
scoring  with  constraints  that  utilizes  the  Stoica-Ng  CCRB  formula  in  (3.3)  as  the 
projection  matrix  in  place  of  the  CRB.  There  are,  of  course,  numerous  instances  of 
matrix  structures  of  the  same  form  as  (3.3),  for  example  as  part  of  the  projection 
matrix  of  the  generalized  least  squares  estimate  of  the  mean  of  a  linear  model. 

In  this  section,  we  develop  a  very  simple  derivation  of  the  CCRB  in  (3.3). 
Rather  than  assuming  the  parameters  satisfy  functional  constraints,  we  approach 
the  problem  theoretically  from  an  alternative,  yet  equivalent,  perspective  and  as¬ 
sume  the  parameters  locally  fit  a  reduced  parametric  model.  This  approach  permits 
the  extension  of  the  existing  classical  theory  underlying  the  CRB  to  the  case  of  a 
model  under  parametric  constraints.  While  it  is  true  that  several  of  these  exten- 
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sions  already  exist  in  the  literature,  there  does  not  exist  a  cohesive  treatment  of 
these  results.  However,  this  chapter  should  not  be  viewed  as  simply  a  collection  of 
historical  results,  but  a  unified  and  comprehensive  development  of  the  theory  of  the 
constrained  Cramer-Rao  bound. 


3.1  The  Constrained  CRB 


Suppose  we  observe  x  e  X  C  Mn  from  a  probability  density  function  p(x;  0) 
where  6  G  0  C  Mm  is  a  vector  of  unknown  deterministic  parameters  and,  in  addition, 
suppose  these  parameters  are  required  to  satisfy  k  consistent  and  nonredundant 
continuously  differentiable  parametric  equality  constraints,  i.e.,  f(0)  =  0  for  some 
consistent  and  nonredundant  f  :  0  — >  We  shall  denote 

e,  =  {0'ee:/(0')  =  o,/  consistent,  nonredundant |  (3.4) 


to  be  the  feasible  set  satisfying  the  constraints.  Hence,  the  constraint  can  also  be 
stated  6  e  0/.  The  constraints  being  consistent  means  that  the  set  0/  is  nonempty. 
The  constraints  being  nonredundant  means  that  the  Jacobian  F(0')  =  has  rank 
k  whenever  f(0')  =  0. 

As  before,  the  Fisher  information  matrix  (FIM)  of  this  model  (ignoring  the 
constraint)  is  given  by 

I(G)  =  Eo{s(X-,6)sT(x-,d)} 


where  s(x ;  0 )  is  the  Fisher  score  defined  by 


s(x]  6)  = 


d\ogp{x\  O') 


00' 


e'=e 
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and  the  expectation  is  evaluated  at  0 ,  i.e.,  Eg(-)  =  /  ( -)p(x ;  6)dx. 

Jx 

Incorporating  this  additional  side  information  into  the  information  from  the 
observations  x  directly  would  require  an  alteration  of  the  pdf’s  dependence  on  the 
unknown  parameter.  Such  an  approach  can  often  be  analytically  impractical  or 
numerically  complex.  Hence,  it  is  desirable  to  have  a  formulaic  or  prescriptive  ap¬ 
proach  to  include  the  side  information,  or  constraints,  indirectly.  To  meet  this  need, 
Stoica  and  Ng  developed  a  method  to  incorporate  parametric  equality  constraints 
into  the  CRB  [68,  theorem  1], 

Theorem  3.1  (Stoica  &  Ng).  The  constrained  Cramer- Rao  bound  on  6  e  0/  is 
given  by 

CCRB(0)  =  U(0)  (1 C/T(0)I(0)E/(0))_1  Ut{9)  (3.5) 

where  U(0)  is  a  matrix  whose  column  vectors  form  an  orthonormal  basis  for  the 
null  space  of  the  Jacobian  F(0),  i.e., 

F{0)U{0)  =  0  ,  Ut(9)U(9)  =  I(m—k)x(m—k) •  (3.6) 

Thus,  if  t{x)  is  an  unbiased  estimator  of  6,  which  satisfies  the  constraint  (3.4),  then 

Var(t(*))  >  CCRB(0) 


with  equality  if  and  only  if  t(x)  —  0  =  CCRB(0)s(ay  0)  in  the  mean-square  sense.1 

1The  original  theorem  requires  the  estimator  to  satisfy  the  constraint.  In  general,  the  parameter 
and  its  unbiased  estimator  will  not  simultaneously  satisfy  the  constraint  since  the  implication, 
mainly  that  f(Egt(x))  =  Egf(t(x)),  is  true  only  under  particular  conditions.  However,  the 
CCRB  is  the  same  if  either  assumption  is  made  exclusively.  In  this  treatise,  I  assume  that  the 
actual  parameter  6  satisfies  the  constraint  and  the  unbiased  estimator  t(x)  does  not.  (In  section 
3.4,  the  constrained  maximum  likelihood  estimator  (CMLE)  is  assumed  to  satisfy  the  constraint, 
but  unbiasedness  is  not  assumed.) 
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The  Jacobian  F(0)  having  full  row  rank  is  not  necessary  since  the  Jacobian 
does  not  explicitly  appear  in  the  CCRB  formula  in  (3.5).  Indeed,  the  requirement 
that  the  column  vectors  of  U (6)  form  an  orthonormal  basis  is  also  unnecessary,  only 
that  they  be  linearly  independent  and  that  they  span  the  basis  of  the  null  space  of 
the  row  vectors  of  F(0),  i.e.,  only  that  the  column  space  of  U(0)  is  an  orthogonal 
complement  of  the  row  space  of  F(0),  is  required  since  it  is  clear  from  the  structure 
of  (3.5)  that  the  CCRB  is  invariant  to  automorphisms  in  Rm-rank(F(0))  on  U(0).  Re¬ 
gardless,  for  convenience,  and  except  where  otherwise  noted,  we  will  assume  that  the 
constraints  are  nonredundant  and  the  columns  of  U(0)  are  orthonormal  to  ensure 
that  rank(C/(0))  =  m  —  k  and  UT{6)U(6)  =  Im-k ,  respectively.  The  existence  of 
the  bound  only  requires  that  UT (6)1  (6)U (6)  rather  than  the  FIM  be  nonsingular.2 
The  original  proof  of  this  theorem  given  by  Stoica  and  Ng  considers  the  variance 
inequality  generated  by  the  random  variable  t(x)  —  6  —  WU (6)UT (6)s(x ;  6)  and 
maximizes  W  to  attain  the  tightest  bound  for  Var(t(x)  —  6). 


Example  3.2  (Unit  Modulus  Constraint).  Suppose  0  in  example  2.2  is  constrained 


to  be  unit  modulus,  i.e.,  /($)  =  |d|2  —  1.  Then  its  gradient  in  terms  of  the  param¬ 


eter  vector  6  = 
complement  U(0)  = 


Re(-d) 

Im(d) 


is  F(0)  =  [2Re(d),  2Im(i?)],  which  has  an  orthonormal 
The  CCRB  for  this  constraint  is  then 


m('d) 

Re(tf) 


CCRB  (6»)  = 


a" 


Im(d)2 


Tm(d)Re(d) 


-Re(d)Im(d)  Re(d)s 


2 A  corresponding  regularity  condition  to  that  mentioned  in  section  2.1  will  be  discussed  in 
section  3.1.4. 
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3.1.1  A  proof  of  the  CCRB 


While  the  proof  given  by  Stoica  and  Ng  is  sufficient  to  establish  the  validity 
of  the  CCRB,  these  proofs  ignore  the  existing  classical  theory  encompassing  the 
CRB  and  FIM,  which  is  already  sufficient  to  prove  the  CCRB.  However,  prior  to 
developing  the  foundation  for  the  CCRB  from  the  existing  theory,  we  require  a  foray 
into  multivariable  calculus  and,  specifically,  the  use  of  the  implicit  function  theorem. 
The  reward  for  this  approach  will  be  a  seamless  presentation  of  statistical  inference 
involving  the  constrained  Cramer-Rao  bound. 

From  the  perspective  of  multivariable  calculus,  the  constraint  f(6)  =  0  effec¬ 
tively  restricts  6  to  a  manifold  ©/  of  the  original  vector  space  0,  with  the  manifold 
having  dimension  rn  —  k  since  k  degrees  of  freedom  are  lost  when  rank(_F(0))  =  k 
for  all  6  G  0/.  More  formally,  the  following  theorem  [65,  theorems  5-1  and  5-2] 
applies. 

Theorem  3.3  (Implicit  Function  Theorem).  Let  U  C  be  an  open  set  and 
assume  f  :  U  — »  Rk  is  a  differentiable  function  such  that  F(G)  has  rank  k  whenever 
m  =  o.  Then  0/  f~l  U  is  an  (m  —  fc)-dimensional  manifold  in  Mm,  and  for  every 
6  G  0/  fl  U  there  is  an  open  set  V  3  0,  an  open  set  W  C  Rm~k,  and  a  1-1 
differentiable  function  go  :  W  — >  Mm  such  that 

(a)  go( W)  =  0/  nun  V,  and 

(b)  the  Jacobian  of  ge(£)  has  rank  m  —  k  for  each  £  e  W. 

Therefore,  there  exists  a  function  go  :  MTO-fc  — >  Mm,  and  sets  O  3d  and  P 
open  in  0/-  and  Wn~k,  respectively,  such  that  go  :  P  — >  O  is  a  diffeomorphism  on  P, 
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i.e. ,  a  continuously  differentiable  bijection  with  a  continuously  differentiable  inverse. 
A  geometric  example  is  shown  in  Figure  3.1.  Note  this  diffeomorphism  depends  on 
the  parameter  6  as  the  reparameterization  is  only  guaranteed  to  exist  in  a  local 
neighborhood  of  0;  however,  for  convenience,  we  will  omit  this  notation  in  this 
subsection  so  that  g  =  go-  Thus,  we  may  proceed  under  the  assumption  that  every 
6  e  O  C  0/  is  the  image  of  a  unique  reduced  parameter  vector  ^  6  Pc  Wn~k 
under  g.  or  simply 

e'  =  g(t').  (3.7) 

Necessarily,  there  exists  some  unique  ^  e  P  for  which  9  =  </(£).  We  will  denote  the 
Jacobian  of  g  to  be  G(£)  =  >  which  also  implicitly  depends  on  9. 


Figure  3.1:  Reparameterization  of  f(9)  =  0  to  6  =  g(£). 


Example  3.4  (Unit  Modulus  Constraint).  As  an  example  of  this  principle,  con¬ 
sider  a  complex  parameter  d  with  a  modulus  constraint  (as  in  example  3.2).  The 
parameter  vector  in  this  case  may  be  6  =  [9i,02]1  =  [Re(i?),  Im(i?)]T  e  M2  with 
the  constraint  being  f(0)  =  Q\  +  Q\  —  1  =  0.  By  the  implicit  function  theorem, 
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constraining  t)  to  be  unit  modulus  is  tantamount  to  assuming  the  existence  of  a 


(eR  such  that  d  =  e  W  Hence,  6  is  a  function  of  £,  i.e. , 


e 


'Re(d)' 

1 

u: 

O 

o 

Im(d) 

-  sin(0j 

=  9(0- 


(3.8) 


Also,  g(M.)  =  {6  e  M2  :  f(0)  =  0}  and  G(£)  =  [— sin(£)  ,  —  cos(£)]T  has  rank  1 
for  every  £.  For  the  model  in  example  3.2,  then 


CCRB(0) 


^  [  sm2(0 

2  -cos(0sin(£) 


-  sin(£)  cos(0 
cos2(0 


which  is  exactly  as  before. 


It  must  be  noted  that  g  will  not  be  unique.  For  the  previous  example,  i)  =  e -A  is 
another  possible  reparameterization.  Nor  is  any  g  satisfying  the  theorem  necessarily 
a  1-1  correspondence  between  Mm-fc  and  0/;  again,  for  the  previous  example,  g  is 
periodic.  Thus,  the  bijection  is  only  guaranteed  locally.  Finding  a  particular  g  for  a 
given  f  and  6  may  not  be  obvious.  Methods  for  approximating  an  implicit  function 
will  be  discussed  in  section  3.1.5.  Regardless,  as  shall  be  shown  in  the  context  of 
the  CCRB,  knowledge  of  any  particular  g  is  unnecessary;  only  its  existence,  given 
by  the  implicit  function  theorem,  is  necessary.  Why?  Using  the  implicit  function 
theorem  to  assume  a  locally  equivalent  reparameterization  for  the  constraint  limits 
the  information  from  the  observations  to  the  density  function’s  local  dependence 
on  the  unknown  parameter.  But  as  the  CRB  (and  hence  CCRB)  is  a  local  bound 
that  only  characterizes  the  local  noise  ambiguities  in  the  model,  i.e.,  the  average 
local  curvature  of  the  density  at  the  parameter  value  of  interest,  this  limitation  is 
invariant  to  the  local  curvature  restricted  to  a  space  determined  by  the  constraints. 
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Theorem  3.5  ([50]).  The  CRB  on  6  e  O  under  the  assumption  of  (3.7)  is  given  by 


G(«)  (GTm(g(S»G(t))-'  Gt(«)  (3,9) 

and  if  g  in  (3.7)  is  an  implicit  function  of  f(0)  =  0  then  this  bound  is  equivalent  to 
the  constrained  Cramer-Rao  bound  in  (3.5). 

Proof.  This  CRB  can  be  developed  from  a  transformation  of  parameters  on  the  FIM 
and  the  CRB.  From  the  CRB  transformation  of  parameters  on  the  CRB  (see  section 
2.1.1).  if  t(x)  is  an  unbiased  estimator3  of  6  =  g(£)  then 

Var (t(x))  >  CRB (9({))  =  G(()f-I(()GT(()  (3.10) 

with  equality  if  and  only  if  t(x)  —  </(£)  =  G(^)/_1(^)s(a;;  ^)  in  the  mean-square 
sense  [36,  Appendix  3B],  where  J(£)  =  E^s(x-,^)sT(x;^)  is  the  FIM  on  ^  and 
s(x;  £)  =  }  *  is  the  Fisher  score  of  the  pdf  with  respect  to  £,  this  pdf  be¬ 

ing  q(x-,£)  =  p(x;  </(£)).  By  application  of  the  derivative  chain  rule  the  Fisher  score 
of  x  with  respect  to  $,  is  s(x;£)  =  GT(£)s(x-,6).  Hence,  from  the  transformation 
of  parameters  on  the  FIM  [12,  p.  157],  the  FIM  on  £  is4 

/(O  =  E  {s(ay£)sT(ay£)} 

=  E{GT{£)s{x-6)sT(x-G)G{£)} 

=  GT(Z)I(d)G(£) 

=  GT(Z)I(g(Z))G(Z).  (3.11) 

3 Again,  there  is  no  actual  use  of  the  assumption  here  that  t(x)  €0/  although  if  t(x)  £  g(V) 
then  t(x)  does  indeed  satisfy  the  constraint.  The  theorem  result  for  unbiased  estimators  holds 
regardless  and  this  will  depend  on  the  regularity  condition  detailed  in  section  3.1.4. 

4Implicitly,  it  is  assumed  that  the  regularity  conditions  of  section  2.1  hold  with  respect  to  £. 
For  how  this  applies  to  6,  see  section  3.1.4. 
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Substituting  (3.11)  into  (3.10)  establishes  the  CRB  under  the  assumption  0  =  g(£). 
To  establish  the  equivalence  with  the  CCRB,  note  since  f°g(£)  =  0,  then  by  taking 
the  Jacobian  with  respect  to  ^  we  have 


°  =  W°  =  i|/(s(?))  =  F(S(0)G(0  =  F(0)G(g-1(e)). 

Hence,  the  columns  of  G(£)  reside  in  the  null  space  of  the  row  vectors  of  F(0). 
And  since  g  has  an  inverse  locally  at  £  =  g^1(0),  then  G?(£)  has  full  column  rank 
m  —  k  locally  about  £.  (This  is  true  on  the  whole  set  P  C  Wm~k.)  Therefore, 
span(G(g_1(0)))  =  span (U(0))  and  there  exists  an  m  —  k  x  m  —  k  full  rank  trans¬ 
formation  matrix  S(0)  such  that  G(g~1(0))S(0)  =  U(0).  (This  is  true  on  the 
whole  set  O  C  0/.)  This  matrix  S(0)  is  merely  an  orthonormalizing  change  of  basis 
on  the  columns  of  G(g  1  (0))-  Therefore, 

G(£)  (Gr«)/(9«))G«))-1GT«) 

=  G(g-\e))  {s-T(e)uT(0)i(e)U(e)S-\0))-1GT(g^(e)) 

=  G(g-'(0)S(0)  (UT(O)I(0)U(0)y1  ST(0)G(g-'(0)) 

=  U(0)  (UT(0)I(0)U(0))1  UT(0). 

Also,  Var (£(£))  >  CCRB(0)  with  equality  if  and  only  if 

t(x)-g(£)  =  G(^)/'1(^)s(a;;^) 
t(x)-0  =  G(£)I-1(£)Gt(€)s(x]  0) 

=  CCRB  (0)s(x;0). 


□ 


29 


This  proof  of  the  CCRB  uses  the  implicit  function  theorem,  the  CRB  transfor¬ 
mation  formula,  the  FIM  transformation  formula,  as  well  as  well-known  properties 
of  rank  and  the  derivative  chain  rule.  An  advantage  of  establishing  the  CCRB  from 
these  classical  results  will  become  clear  as  we  establish  the  connection  throughout 
statistical  information  theory.  An  example  of  this  is  immediately  evident  in  an 
alternative  proof  of  a  proposition  of  Stoica  and  Ng  [68,  proposition  1], 

Corollary  3.6  (Stoica  &  Ng).  Given  the  regularity  conditions  on  a  necessary 
and  sufficient  condition  for  the  existence  of  a  finite  CCRB  of  6  is 

\uT(e)i(0)U(0)\^o, 

i.e.,  UT  (6)I(6)U  (6)  is  nonsingular. 

Proof.  From  the  prior  theorem,  it  is  clear  that  U1  (6)1(6)17(6)  is  nonsingular  if  and 
only  if  GT(£)I(g(£))G(£)  is  nonsingular  if  and  only  if  /(£)  is  nonsingular.  Since  a 
necessary  and  sufficient  condition  for  the  existence  of  a  finite  CRB  of  g(£)  is  that 
I(£)  is  nonsingular,  the  corollary  is  proven. 

Thus,  with  the  usual  regularity  conditions  (see  section  2.1)  being  maintained 
for  £,  the  conditions  for  the  existence  of  the  CCRB  with  respect  to  6  are  equivalent 
to  the  conditions  for  the  existence  of  the  CRB  in  the  reduced  parameter  space  of  £. 

Moreover,  the  matrix  U 1  (6)1  (6)U (6)  is  nonsingular  if  and  only  if  U(6)  has 
full  column  rank  m—k  and  no  linear  combination  of  the  columns  of  U  (6)  reside  in  the 
null  space  of  1(6).  The  first  condition  is  satisfied  always  by  definition.  The  second 
condition  implies,  for  example,  that  if  L(6)LT (6)  is  the  Cholesky  decomposition  [20, 
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p.194]  of  the  FIM,  where  L(0)  G  is  a  full  column  rank  lower  triangular 

matrix  with  strictly  positive  values  on  the  diagonal,  then  U{0)  =  L{0)A{0)  + 
K(0)B(G)  where  K{6 )  is  an  orthogonal  complement  of  L(0)  and  for  some  full 
column  rank  A(0)  G  wank^e^xm~k. 

Further  properties  of  the  CCRB  will  be  discussed  in  section  3.1.4. 

3.1.2  Alternative  formulas 

The  formula  Stoica  and  Ng  used  to  express  the  constrained  Cramer-Rao  bound 
is  a  generalization  of  an  expression  developed  earlier  first  by  Gorman  and  Hero  [23, 
theorem  1]  and  later  by  Marzetta  [47,  theorem  2].  This  is  the  CCRB  formula  in 
(3.12).  Although  Gorman  and  Hero’s  formula  requires  a  nonsingular  Fisher  informa¬ 
tion,  the  version  developed  by  Stoica  and  Ng  appears  to  be  inspired  by  a  result  [23, 
(19)  in  lemma  2]  in  Gorman  and  Hero’s  paper  that  unnecessarily  assumes  a  positive 
definite  FIM.  However,  this  result  was,  in  essence,  not  unknown  in  the  literature. 
Gorman  and  Hero  were  aware  of  the  prior  work  of  Aitchison  and  Silvey  [3,  theorem  2 
and  P  on  p.823],  which  is  concerned  with  the  asymptotic  variance  of  the  maximum 
likelihood  estimator  subject  to  restraints.  But  they  were  perhaps  unaware  (by  lack 
of  citation)  of  a  later  paper  on  hypothesis  tests  associated  with  the  maximum  like¬ 
lihood,  in  which  Aitchison  and  Silvey  suggest  a  solution  to  the  problem  of  singular 
information  matrices  [4,  section  6].  This  is  the  CCRB  formula  in  (3.14).  This  ver¬ 
sion  of  the  CCRB  was  also  proven  by  Crowder  [18,  theorem  3].  Concurrent  to  the 
Gorman  and  Hero  effort,  Hendriks  [27]  and  later  Oiler  and  Corcuera  [53]  developed 
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an  extension  of  the  Cramer-Rao  bound  intrinsic  to  the  manifold  using  Riemannian 
geometry.  More  recently,  Xavier  and  Barroso  [74,  75]  specified  the  lower  bound  on 
the  geodesic  of  estimators  to  the  true  parameter.  The  original  and  latest  versions 
of  their  bound  are  expressed  in  (3.16)  and  (3.17),  respectively. 

While  these  CCRB  expressions  are  not  the  focus  of  the  current  treatise,  they 
are  still  important  for  possible  insights  into  the  constrained  Cramer-Rao  bound.  In 
this  section,  aspects  of  these  insights  will  be  briefly  discussed  as  well  as  conditions 
for  equality  with  the  CCRB  expression  in  (3.5). 

3. 1.2.1  Gorman-Hero-Aitchison-Silvey  CCRB 

Aitchison  and  Silvey  used  the  method  of  Lagrange  multipliers  to  show  that 
the  weighted  asymptotic  variance  of  the  constrained  maximum  likelihood  estimator, 
which  should  be  the  CCRB  (implicitly),  tends  to 

CCRB2(0)  =  I"1^)  -  I~1(G)FT(G)  (F(0)I-1(d)FT(d))~1  F(G)I~1(G)  (3.12) 

when  the  Fisher  information  is  nonsingular.  Alternatively,  Gorman  and  Hero  de¬ 
veloped  this  same  CCRB  by  restricting  test  points  in  the  Chapman-Robbins  bound 
(a  Barankin-type  bound)  to  be  in  the  constraint  space  0/  and  then  finding  the 
derivatives  as  the  limit  of  the  finite  difference  expressions  in  the  Chapman-Robbins 
bound.5  A  simpler  proof  was  provided  by  Marzetta  by  considering  the  positive 
semidefiniteness  of  a  properly  chosen  random  variable. 

A  particular  advantage  to  this  presentation  of  the  CCRB  is  the  explicit  quan- 

5A  definition  of  the  Chapman-Robbins  bound  as  well  as  a  variant  of  the  Gorman  and  Hero 
proof,  which  allows  for  a  singular  FIM,  can  be  found  in  Appendix  A.l. 
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tihcation  of  the  gain  in  performance  potential.  Imposing  a  constraint  f(G)  =  0 
on  a  set  of  parameters  improves  (lowers)  the  unconstrained  bound  from  I^1(G)  by 
exactly  I~1(G)FT(G)  (F(G)I~1(G)Ft(G))~1  F(G)I~1(G).  Since  the  CRB  of  f(0) 
is  F(G)I~1(G)F1  (G)  then  its  inverse  is  the  Fisher  information  of  f(G)  or  0.  And 
FT(G)  ( F(G)I~1(G)FT(G ))  1  F(G)  is  the  Fisher  information  of  G  generated  from 
the  constraint  f(G)  =  0.  A  disadvantage  is  the  requirement  of  a  nonsingular  FIM. 
There  exist  numerous  scenarios  that  require  constraints  for  the  original  model  to  be 
identifiable  (see  Chapter  4).  Additionally,  this  CCRB  formula  requires  nonredun¬ 
dant  constraints,  i.e.,  the  Jacobian  F(G)  must  be  full  row  rank. 

Similarities  include  the  computational  complexity  of  both  formulas,  which 
is  0(m3).  And  when  the  Fisher  information  is  nonsingular  (and  the  constraints 
nonredundant),  both  formulas  are  equivalent. 

Theorem  3.7.  When  the  Fisher  information  is  nonsingular  and  the  constraints 
nonredundant,  then  an  equivalent  formula  for  the  CCRB  in  (3.5)  is  CCRB2(0). 

Proof.  This  is  a  different  proof  than  the  one  provided  in  [68,  corollary  l].6  The 
existence  of  the  Gorman-Hero-Marzetta  formula  assumes  that  the  FIM  1(G)  and 
the  CRB  of  the  constraint  F(G)I  1(G)Ft(G)  are  regular  (non-singular)  [23,  47]. 

Correspondingly,  the  existence  of  the  Stoica  and  Ng  CCRB  formula  assumes  that 

6In  addition  to  this  alternate  proof,  they  reference  an  algebraic  identity  from  [37]  that  is  useful 
in  establishing  the  result. 

Lemma  3.8  (Khatri).  Suppose  A  is  p  x  q  and  B  is  p  x  p  —  q  have  ranks  q  and  p  —  q  respectively 
such  that  BT A  =  0.  Then  for  any  symmetric  positive  definite  matrix  S, 

S -1  -  S-'A  ( AtS~1A)~ 1  =  B  ( BtSB)~ 1  BT. 

Substituting  1(6)  for  S,  F(0)  for  A1 ,  and  U(6)  for  B  shows  the  equivalence  between  the  two 
CCRBs  for  when  the  FIM  is  nonsingular. 
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UT  (G)I(6)U  (G)  is  non-singular.  Now,  right-multiplying  both  formulas  by  FT(Q ) 
returns  the  results 

CCRB(0)FT(0)  =  U(G)(UT(G)I(G)U{G))~1UT(G)FT{G)  =  0, 
CCRB2(0)FT(0)  =  I~\G)Ft(G)  -  I~1(G)Ft(G)  =  0 

since  F(G)U(G )  =  0  in  the  first  equation  and  the  elimination  of  the  inverse  in 
CRB(/(0))  in  the  second.  Alternatively,  right-multiplying  both  formulas  by  the 
matrix  I(G)U(G)  returns  the  results 

CCRB  (G)I(G)U(G)  =  U(G), 

CCRB  2(G)I(G)U(G)  =  U(G) 

again  by  eliminating  the  inverse  in  the  first  equation  and  since  F{G)U{G)  =  0  in 
the  second.  Hence  we  have  the  equality 

CCRB(0)  [FT(G)  I{G)U(G)}  =  CCR B2(0)  [FT(G)  I(G)U(G)\ 

and  if  it  can  be  shown  that  the  matrix  [FT(G),  I{G)U(G)\  is  regular,  then  the  two 
CCRB  formulas  are  shown  to  be  equivalent.  Suppose  there  exists  vectors  a  E 
and  (3  E  Wn~k  such  that 

Ft{G)oc  +  I(G)U{G)(3  =  0.  (3.13) 

Pre-mult iplying  (3.13)  by  UT(G )  implies  that 

Ut(G)I(G)U(G)/3  =  0. 
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Since  UT(0)I(6)U(0)  is  regular  then  (3  =  0.  Likewise,  premultiplying  (3.13)  by 
F(0)I~1(6)  implies  that 

F(0)I~1(0)FT(0)a  =  0. 

Since  F(6)I^1(6)F1  ( 6 )  is  regular  then  a  =  0.  Hence  FT(6)ct  +  I(0)U (0)/3  =  0 
implies  a  =  0  and  (3  =  0,  which  proves  that  [ F 1  (0)  I(0)U(0 )]  is  full  rank.  □ 

3. 1.2.2  Aitchison-Silvey-Crowder  CCRB 

The  solution  for  resolving  invertibility  of  singular  FIMs  in  the  variance  and  test 
results  of  Aitchison  and  Silvey  [4]  was  to  load  the  Fisher  information  with  a  matrix 
of  the  form  F 1  (0)F(0).  This  was  made  more  rigorous  by  Crowder  [18]  by  replacing 
the  Fisher  information  1(0)  with  a  loaded  FIM  D(0)  =  1(6)  +  FT(0)KF(0), 
where  K  is  any  positive  semidehnite  matrix  such  that  D(0)  is  regular.  Hence,  a 
generalization  of  (3.12)  is 

CCRB3(9)  =  D-\e )  -  D-\B)Ft(B )  (F(B)D-1(B)FT(B)y1  F(B)D-\B). 

(3.14) 

This  extension  permits  a  singular  FIM.  This  formulation  is  also  independent  of  the 
choice  of  K. 

Theorem  3.9.  An  equivalent  formula  for  the  CCRB  in  (3.5)  is  CCRB3(0). 

Proof.  Replace  1(0)  with  D(0)  in  the  proof  of  theorem  3.7.7  ^ 

7 In  appendix  A. 3,  this  Crowder  formula  for  the  asymptotic  variance  of  the  constrained  maximum 
likelihood  estimate  is  shown  to  be  equivalent  to  the  CCRB  using  lemma  3.8  (also  see  section  3.4.2). 
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As  the  computational  complexity  of  the  two  formulas  are  0(m3),  there  is  no 
direct  advantage  of  one  CCRB  version  over  the  other.  Certainly,  in  the  nonsingular 
FIM  case,  CCR  B.{(0)  is  the  same  as  the  CCR (with  K  =  0).  Unfortunately, 
this  CCRB  formula  does  not  appear  to  have  simple  connections  to  other  areas  of 
statistical  inference  in  an  inherent  manner.  Nevertheless,  this  possible  inefficacy  in 
theoretical  applications  does  not  effect  its  practical  use. 

3. 1.2.3  Xavier’s  &  Barroso’s  Intrinsic  Variance  Lower  Bound 

The  prior  CCRB  metrics  were  in  Euclidean  Mm  space,  i.e.,  the  lower  bound  is 
on  the  measurement  of  the  distance  (in  some  direction  or  dimension)  of  the  estimator 
to  the  true  value  of  the  parameter  measured  by  “cutting  through”  the  manifold.  In 
some  scenarios,  it  may  be  of  interest  to  know  what  the  bound  is  on  the  measurement 
of  the  distance  “over  the  surface”  of  the  manifold.  Since  dimensional  directions  can 
be  somewhat  ambiguous  depending  on  the  manifold,  of  particular  interest  is  the 
geodesic,  or  shortest  distance. 

For  this  scenario,  Xavier  and  Barroso  [74,  75]  formulated  an  inequality,  the 
intrinsic  variance  lower  bound  (IVLB),  for  the  variance  of  the  geodesic  to  an  unbiased 
estimator  intrinsic  to  the  manifold.  Their  results  are  derived  from  those  of  Hendriks 
[27]  and  Oiler  and  Corcuera  [53] .  Ignoring  elements  of  Riemannian  geometric  theory 
that  are  beyond  the  scope  of  this  presentation,  their  IVLB  result  essentially  relies 
on  the  inequality 

V/C^/var(-d)  cot(v/C\/var(^))  <  ^  (3.15) 

v -V 
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where  C  is  an  upper  bound  on  the  sectional  curvature  of  the  manifold,  Xg  is  a 
bound  on  variance  of  the  Euclidean  estimator  error,  and  var(d)  is  the  variance  of 
the  geodesic.  Precisely,  C  =  maxge©/  Kg  where 


Ke  =  max  <  fl(vi,  Vi),  fl(v2,  v2)  >  -  <  II(vi,  v2),  II(vi,  v2)  > 

V\,V2  orthonormal 
F(9)vi=0 

and  II(-,  •)  is  the  second  fundamental  form  [34]  of  defined  by 


II(a,  b)  =  - Ft(G )  ( F{Q)FT(6 )) 


-i 


TW(0)b 

a  de  u 


kx  1 


on  U  x  U  where  U  =  span (E/(0)}. 

Xavier  and  Barroso  use  a  polynomial  bound  on  the  cotangent  to  solve  the 
lower  bound  of  a  quadratic  in  terms  of  var(d) .  In  an  earlier  variant  of  the  I  VLB  [74], 
Xavier  and  Barroso  chose  A^1  =  max  v1 1(9 )v  and  bounded  tcot(t)  >  1  —  1 12, 

v£.U,  Hvll^l 


for  0  <  t  <  T  =  1.35,  in  (3.15)  where  t  =  VC^/v ar(d),  to  bound  on  the  variance  of 
the  estimator’s  geodesic  to  the  mean  by 

4  C  +  3A#  —  AyAe(9A0  +  24  O) 


var(d)  > 


|c2 


(3.16) 


Unfortunately,  this  bound  was  optimistic  in  the  limit  for  the  simple  Euclidean  case 
(C  =  0).  In  a  more  recent  paper  [75],  the  authors  improved  the  bound  by  choosing 
A g  =  tiiU1  ( 0)I(0)U(0 ))-1  and  an  alternative  lower  bound  for  tcot(t)  to  obtain 


var(  d)  > 


A  gC  +1  —  \/2X  gC  +  1 


\C*\g 


(3.17) 


Although  the  authors  omitted  a  prooP,  the  alternative  lower  bound  for  fcot(t) 


appears  to  be  tcot(t)  >  1  —  Af2  for  0  <  t  <  T.  An  immediate  benefit  from  this 


8Xavier  and  Barroso  stated  in  [75]  that  the  proof  would  “be  found  in  the  companion  paper 
[14]”,  but  this  companion  paper  appears  to  have  never  been  published. 
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improved  bound  is  the  agreement  with  the  simple  (C  =  0)  Euclidean  case,  i.e., 
var(d)  >  A e  =  tr (U7  (9)I(9)U(9))~1.  This  consistency  is  entirely  due  to  the  choice 
of  A 0.  Even  so,  greater  improvement  of  the  bound,  not  found  in  the  literature,  is 
possible  in  the  curved  scenarios  ( C  >  0)  at  least  in  the  bound  of  f  cot(f).  Note,  for 
0  <  t  <  7T 


fcot(f)  = 


> 


> 


1+E 


(-1  )s22sB2st2s 

(2i)! 


1  - 


3 


2 


where  l?2s  are  Bernoulli  numbers,  £(•)  is  the  Riemann  zeta  function,  and  b  = 
yi.,  S3  0.8177.  Then 

‘-(f) 


var(d)  > 


A eCb  +  1  -  \Z2\eCb  +  1 

\CWXe 


(3.18) 


3.1.3  Simple  Extensions  to  the  CCRB 

As  with  unconstrained  parameters,  the  performance  of  a  function  of  parame¬ 
ters  often  may  be  of  greater  interest.  Consider  a  continuously  differentiable  func¬ 
tion  k  :  0/  — >  M9.  Denote  the  Jacobian  of  this  transfer  function  to  be  K(9)  = 

.  We  have  a  simple  extension  of  the  classical  transformation  of  parame- 

e'=e 

ters  in  section  2.1.1. 

Corollary  3.10.  If  f(6)  =  0,  then  the  variance  of  any  unbiased  estimator  S(x)  of 


dk{0  ) 
do'T 
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a.  =  k(0)  satisfies  the  inequality 


Var(S(®))  >  CCRB(a)  =  K{G)CCRB(G)KT  (G). 

Proof.  Let  go  be  the  implicit  function  defined  by  f(G)  =  0.  Then  a  =  k(go(£)) 
has  a  Jacobian  K(G)Gg(£).  From  the  classical  transformation  of  parameters,  the 
inequality  Va x(S(x))  >  CRB(a)  holds  where 

CRB  (a)  =  K{G)Ge{£)r\t)GTe(£)KT{G) 

=  K(G)U(G)  (i Ut(G)I(G)U{G)]~ 1  Ut{G)Kt(G). 

□ 

This  transformation  property  is  useful  in  extending  the  constrained  Cramer- 
Rao  bound  to  biased  estimation. 


Example  3.11  (Biased  Estimation).  Assume  t(x)  is  a  biased  estimator  of  a  con¬ 
strained  parameter  G  with  bias  b{G)  =  Egt(x )  —  G  and  constraint  f(G)  =  0.  De¬ 
fine  k(G)  =  G  +  b(G).  Then  t(x)  is  an  unbiased  estimator  of  a  =  k(G).  Since 


K(G)  =  Imxm  +  B(6)  where  B(G)  = 


db(G  ) 


d0'T 


o' =0 


,  then  we  have  the  inequality 


Var(t(*))  >  {Imxm  +  B(G))  CCRB(6»)  (lmxm  +  BT{G))  =  CCRB(6>  +  b{G)). 


Often,  when  the  Fisher  information  matrix  is  singular,  its  pseudoinverse  is 
used  as  a  bound  on  the  variance  of  an  estimator.  As  mentioned  in  section  2.1.1,  the 
bound  is  trivially  true  for  some  component  of  the  estimator  except  under  certain 
conditions. 
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Corollary  3.12.  If  t(x)  is  an  estimator  of  k(0)  having  bias  b(G),  where  6  satisfies 
the  constraint  f(0)  =  0,  then  the  inequality 


Var(t(*))  >  H(0)U(0)  (UT(G)I(G)U(G))]  UT(G)HT(G), 

where  H{0)  =  K(0)  +  B(G ),  is  nontrivially  satisfied,  i.e.,  all  components  of  t(x) 
have  finite  variance,  if  and  only  if 

H{0)U{0)  =  H{0)U{0)  (UT(0)I(G)U(G))]  UT (6)I(6)U (6). 

Proof.  Suppose  t(x)  is  an  estimator  of  a.  =  k{0)  with  bias  b(G)  and  under  the 

constraint  f(G)  =  0.  Define  H{0)  =  (k{0)  +  b{0))  I  ,  =  K{0)  +  B(0).  If 

0  =0 

go  is  the  implicit  function  defined  by  /,  then  we  have  the  inequality 

Var(t(*))  >  H(S)iHZ)HT(S) 

where  H({)  =  ^  (k(ge(£))  +  b(ge(£')))  =  H(0)Ge(t).  (This  can  also 

be  inferred  from  [23,  lemma  2],  although  no  assumption  is  made  here  about  the 
singularity  of  the  Fisher  information.)  Hence,  all  the  components  of  t(x)  can  have 
finite  variance  if  and  only  if  [66] 

m)  =  mm)?® 

H(G)Go^)  =  H(0)Ge(t)P(t)i(t) 

=  H(d)Go(Z)P(Z)GTe(Z)I(d)Go(Z) 

=  H(0)U{0 )  (UT(0)I(0)U(6))]  UT(0)I(0)Ge(£) 
H{0)U{0)  =  H(G)U(G){UT(G)I(0)U(G))]UT(G)I(G)U(G). 

□ 
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By  definition  U  ( 6 )  will  always  be  full  column  rank,  so  singularity  of  the  ma¬ 
trix  UT  (6)I(6)U  (6)  implies  singularity  in  the  FIM  1(0)  and  insufficiency  in  the 
constraints  to  resolve  the  inherent  ambiguities  in  the  model.  This  scenario  will  be 
discussed  in  greater  detail  in  the  next  section. 

3.1.4  Properties  of  the  CCRB 

The  constrained  Cramer- Rao  bound  for  the  constraint  f(0 )  =  0  is  equiva¬ 
lent  to  the  CRB  for  any  reparameterization  of  the  parameters  satisfying  the  con¬ 
straint.  Implicit  in  that  equivalence  and  in  the  proof  of  the  CCRB  presented  in 
section  3.1.1  are  the  regularity  conditions  for  the  CRB  on  the  implicit  parameter 
£,  i.e.,  -^rE^(h(x))  =  E^(h(x)sT  (x;  £))  for  h(x)  =  1  and  h(x )  =  t(x),  where  t(x ) 
in  this  case  is  an  unbiased  estimator  of  0  =  g(£).  This  condition  translates  to 
-^rE^(h(x))G($,)  =  E^(h(x)sT (x\  0)G(£))  or,  strictly  in  terms  of  0, 

^Ee(h(x))U(6)  =  Ee(h(x)sT(x ;  0)17(0))  (3.19) 

for  h(x)  =  1  and  h(x)  =  t(x).  From  this  condition,  it  can  be  shown  that 

E0(t(x)  -  0)sT(x;  0)U(0)UT(0)  =  U(0)UT(0),  (3.20) 

which  is  a  variant  of  the  regularity  condition  required  and  proven  by  Marzetta  [47] 
and  the  same  regularity  condition  simply  stated  (but  not  proven)  by  Stoica  and  Ng 
[68]  in  their  proofs  of  the  CCRB.  Thus,  as  with  the  CCRB,  the  regularity  condition 
for  the  constraint  is  equivalent  to  the  regularity  condition  for  the  reparameterization 
of  parameters  satisfying  the  constraint.  This  general  fact,  a  restatement  of  theorem 
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3.5,  is  the  quintessential  property  of  the  CCRB  formula.  More  explicitly,  the  CCRB 
is  a  generalization  of  both  the  degenerate  and  the  determinate  constraint  cases,  as 
examples  3.13  and  3.14  demonstrate. 

Example  3.13  (Degenerate  Case).  The  scenario  with  no  constraint  is  equivalent 
to  the  statement  that  the  function  which  describes  the  constraint  is  null.  That  is, 
f  :  0  — »  M°  and  f(G)  —  [  ].  Then  F  :  0  — >  M0xm  is  also  a  null  gradient  row  vector 
having  rank  0.  Any  nonsingular  m  x  m  matrix  U(G)  satisfies  (3.6),  and  thus 

U{0)  ( UT{G)I(G)U{G))~ 1  UT(0)  =  I~l(0). 

Therefore  the  CCRB  formula  also  incorporates  the  unconstrained  scenario. 

Example  3.14  (Determinate  Case).  Suppose  the  constraint  f  completely  deter¬ 
mines  the  parameter.  Then  necessarily,  since  we  have  m  unknowns  in  the  param¬ 
eter  vector  0,  there  must  be  at  least  k  >  m  equations  in  the  constraint  equa¬ 
tion  and  the  Jacobian  F(G)  must  have  rank  m.  Since  f  is  assumed  to  have 
nonredundant  constraints,  F(G)  is  actually  a  nonsingular  square  matrix.  Only 
the  null  vector  U  :  O  — »  Mmx0  satisfies  (3.6)  ( Ur(G)U(G )  =  I0xo),  therefore 
U(G)  ( 'UT(G)I(G)U(G ))  1UT(G)  is  a  null  element,  and  the  CCRB  does  not  exist 
for  a  completely  determined  parameter  set. 

Thus  far,  only  parametric  equality  constraints  have  been  considered.  Gorman 
and  Hero  show  that  only  active  constraints  results  in  a  reduction  in  the  bound  [23, 
lemma  4],  i.e.,  inactive  (or  strict)  inequality  constraints  do  not  contribute  informa¬ 
tion  to  the  model  in  the  CCRB  sense.9  As  the  test  points  approach  the  parameter 
9Because  the  CCRB  (CRB)  is  a  local  bound  that  only  accounts  for  local  fluctuations  of  the 
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in  the  Chapman-Robbins  bound,  they  become  interior  points  of  the  constraint  set 
and  the  inequalities  have  no  impact  on  the  information  metric.  This  can  also  be 
shown  without  resorting  to  the  Chapman-Robbins  approach. 


Example  3.15  (Inequality  Constraints  Are  Non-Informative).  Assume,  in  addition 
to  the  constraint  set  0/,  the  parameters  are  required  to  satisfy  the  strict  inequality 
h(d)  <  0  where  h  :  0  — >  M  is  a  continuously  differentiable  function.  To  incorporate 


this  constraint,  we  introduce  a  dummy  parameter  {)  and  add  a  new  equality  con¬ 
straint  02  =  —h(0),  which  is  equivalent  to  the  strict  inequality  constraint  (whenever 


d  7^  0).  In  addition  we  create  an  extended  parameterization  q b'  = 


€  Rm+1  and 


r  f(o') 

constraint  function  f*(d>)  — 

v ’  [0  2  +  h(0  ) 

and  a  Jacobian  matrix  defined  by 


.  This  will  generate  a  Fisher  information 


r(<t>) 


'm  o' 
o  o 


'  F{0)  O' 
HT{0 )  20  ’ 


respectively, 
long  as  0  ^ 
satisfy  (3.6) 


dh{  e ) 


.  Note  that  F*(d>)  will  be  full  row  rank  as 

o'=o 


where  H(0)  =  —^r 
0.  If  U(0)  is  defined  as  in  (3.6),  then  U*(4>)  = 


'  U{0)  ' 

vT(6, 0) 


will  also 


with  respect  to  F*(<f>),  where  v(0,0)  =  -k HT(0)U(0 ).  By  theorem 


true  parameter,  inactive  constraints  have  no  impact  on  performance  potential.  As  such  the  CCRB 
(CRB)  only  provides  information  on  the  pdf  for  the  mainlobe  of  the  density  function,  which  typi¬ 
cally  corresponds  to  the  true  parameter.  For  a  number  of  scenarios,  e.g.,  when  there  is  sufficiently 
large  variance  in  the  pdf,  sidelobes  of  the  distribution  impact  the  performance,  thereby  making 
the  CCRB  (CRB)  overly  optimistic.  This  occurs  frequently  in  communications  when  the  signal- 
to-noise  ratio  (SNR)  or  data  transmission  size  decreases. 
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3.1,  then 


U'(4>)  {u^wrwu-wy1  U't(4>) 

=  u'(4>)(u(e)ue)u(e)y1u'T(4) 

U(0)  (U(0)I(0)U(0))~1  U(0)  U{0)  (£7(0)1(0)17 (61)) _1  v(0,  tf)  ' 

vT(e,tf)(u(e)i(e)u(e)y'uT(6)  vT(e,ti)(u(e)i(e)u(e)y'v(ej) 

Hence,  the  CCRB  on  the  9  components  of  <fi.  the  upper-left  submatrix,  remains 
unchanged. 

Equality  constraints,  however,  do  add  side  information  to  the  model.  Thus, 
it  is  intuitive  to  expect  that  the  constrained  model  should  result  in  a  lower  bound 
compared  to  the  bound  for  the  model  without  constraints.  This  statement  was  made 
in  Gorman  and  Hero  [23,  p.  1292] ,  but  not  in  Marzetta  [47]  nor  Stoica  and  Ng  [68]. 
In  the  latter  case,  the  statement  is  only  true  under  certain  conditions.  Prior  to 
establishing  when  the  bound  is  lowered,  a  powerful  lemma  will  be  proven. 

Lemma  3.16.  For  an  arbitrary  full  column  rank  matrix  A,  and  an  arbitrary  sym¬ 
metric  positive  semidehnite  matrix  B ,  the  inequality 

A  (AtBA)]  At  <  B]  (3.21) 

holds  over  the  projection  subspace  of  B^B  with  equality  if  and  only  if  rank  (A7  BA) 
=  rank(i?). 

Proof.  Let  LL1  be  the  Cholesky  decomposition  of  B  [20,  p.  194],  Then  L  e 
Ujmxrank(_B)  -g  a  qq}  column  rank  lower  triangular  matrix  with  strictly  positive  values 
on  the  diagonal.  To  show  the  inequality,  consider  linear  unbiased  estimates  of  the 


44 


mean  in  the  model  y  =  Df3  +  e  where  e  ~  J\f( 0,  (L7  L)~2),  and  /3  is  treated  as 
the  unknown  parameter.  In  particular,  y  is  such  an  estimate  with  variance  equaling 
(L7 L)~2  where  as  the  best  linear  unbiased  estimate  (see  section  2.3.1) 

D(3  =  D  (Dt(LtL)2D)]  DT(LTL)2y, 

with  variance  equal  to  D  (Z)T (L7  L)2 D)"1  DT .  By  the  Gauss-Markov  theorem  we 
have  the  inequality  with  the  BLUE’s  variance 

D  (Dt(LtL)2D)]  Dt  <  ( LtL)~ 2 

with  equality  if  and  only  if  ( LTL)D  [DT (LT L)2 D)]  DT (LT L)  =  /rank(i)xrank(r)- 
Substituting  in  D  =  [L7  L)  1  L7  A  in  3.1.4  and  pre-  and  post-multiplying  both 
sides  by  L  and  LT,  respectively,  we  have  the  inequality 

B]BA  (AtLLtA)]  AtBB^  <  Bl 

Considering  quadratic  forms  in  B^Bv  proves  the  result  since  for  any  vector  v  G 
Mm,  vTB^BBJfBB^v  =  vTB^v.  Moreover,  from  the  definition  of  D,  we  have 
(. L 7  L)D  (Dt(L7  L)2Dy  Dt(L 7 L)  =  /rank(i)xrank(i)  if  and  only  if  it  can  be  shown 
that  L  A  (A  LL  A )  A  L  =  ^rank(i)  xrank(n)  j  which  is  so  if  and  only  if  A  and  L 
satisfy  rank  (A7  L)  =  rank(S).  j  | 

This  lemma  proves  the  following  results,  including  that  in  a  linear  subspace  the 
OGRB(0)  is  lesser  than  or  equal  to  the  CR  11(0)  in  the  matrix  sense  and  that  con¬ 
straints  strictly  on  the  ambiguous  information  have  no  impact  on  the  performance 
of  the  unambiguous  information. 
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Theorem  3.17.  Let  U(0)  be  defined  as  in  (3.6)  from  a  constraint  f(6)  =  0.  Then 


U(6)  (UT (0)I(6)U (0)f  UT (0)  <  I\0) 
over  the  linear  projection  subspace  defined  by  V  (6)1(6). 

Proof.  From  the  lemma,  defining  A  =  U(0)  and  B  =  1(0)  gives  the  inequality. 

□ 

The  conditions  under  which  U(0)  (UT  (0)1(0)11(0))^  UT(6)  and  P(6)  are 
nontrivial  bounds  are  detailed  in  corollary  3.12  and  section  2.1.1,  respectively.  The 
projection  space  of  I' (0)1(0)  corresponds  to  the  identifiable  components  of  6  with¬ 
out  the  constraints  (e.g.,  see  [18,  section  4]).  The  theorem  establishes  the  result 
that  the  constraints  can  only  lower  the  bound  and  thereby  increase  performance  po¬ 
tential  for  (functions  of)  parameters  that  are  already  identifiable.  This  is  regardless 
of  whether  the  Fisher  information  is  singular  or  whether  the  parameters  are  iden¬ 
tifiable  under  constraints  (whether  U1  (0)1  (0)U (0)  is  singular,  see  Theorem  3.24). 
Theorem  3.17  is  more  general  than  the  result  shown  by  Ash10,  which  is  given  by  the 
following  corollary. 

Corollary  3.18  (Ash).  Provided  no  linear  combination  of  U(0)  lies  in  the  null 
space  of  the  Fisher  information  matrix,  then 

U(0)  (UT (6) I (0)L/(0))-1  UT(6)  <  I](6) 

over  the  linear  projection  subspace  defined  by  V  (6)1(0). 

10Although  not  stated  in  this  manner,  this  result  appears  in  Ash’s  thesis  [5,  equation  (3.63)]  as 
well  as  in  a  publication  of  his  third  chapter  in  Ash  and  Moses  [6,  equation  (63)].  The  results  from 
Lemma  3.16  are  more  general  than  those  in  [5]  or  [6]. 
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Proof.  Since  U  ( 0 )  is  full  column  rank  and  the  range  space  of  its  columns  does  not 
exist  in  the  null  space  of  the  FIM,  then  ( U 7  (0)I(0)U  (0))  is  nonsingular.  □ 

Furthermore,  if  the  Fisher  information  matrix  is  nonsingular,  then  P (0)1(0) 
is  the  identity  matrix,  so  the  inequality  holds  as  a  quadratic  form  over  Mm  and 

U(0)  (UT(0)I(O)U(O))~1  UT(0)  <  I~\0). 

This  was  also  cleverly  shown  by  Gorman  and  Hero  [23,  (44)  in  remark  4],  again, 
with  the  unnecessary  assumption  that  1(0)  is  regular  (non-singular).  Next,  we  use 
the  lemma  to  observe  the  existence  of  non-inf  or  motive  constraints. 

Corollary  3.19.  Assume  the  row  vectors  of  F(0)  e  Mfcxm  form  a  linearly  indepen¬ 
dent  basis  for  the  null  space  of  1(0).  Then  F\(0 )  e  xm  is  a  linear  combination 
of  a  submatrix  of  F(0)  if  and  only  if 

um  (c/1Tw/(0)c/i(0))t  =  /*(«), 


where  U\(0)  is  defined  as  in  (3.6)  relative  to  F\  (0). 


Proof.  Without  loss  of  generality,  partition  F  as 
as  in  (3.6),  define 


Fi(0) 

F2(0) 


Then  if  U (0)  is  defined 


um 


U(8),  (lmxm  -  F?  (. F iF?)-'  Fi )  if  (B)D(6) 


where  D(0)  represents  a  Gram-Schmidt  processing  matrix  which  orthonormalizes 
the  column  vectors  of  (Vmxm  —  Ff  (F\Ff)  1  Fij  F,J (0)  F,J (0)  (these  are  already 
orthogonal  to  the  column  vectors  of  U (0)).  This  satisfies  (3.6).  Since  span(J(0))  C 
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span(L/i(0))  then  rank(UT  (0)1  (0))  =  rank(J(0)),  and  thus,  by  lemma  3.16, 

tM»)  (c/?’(«)/(e)c/1(«))t  c/ys)  =  /'(«). 

□ 

This  corollary  details  the  necessary  and  sufficient  characteristics  for  a  con¬ 
straint  to  be  non-informative.  Implicit  from  the  lemma  and  above  corollary  is  the 
CCRB’s  invariance  to  linear  combinations  of  columns  of  U(0)  that  exist  in  the  null 
space  of  1(0).  The  specific  case  when  spa n(J(0))  =  span([/(0))  is  also  shown  by 
Ash  in  [5,  equation  (3.73)]  and  [6,  equation  (73)]. 11 

The  CCRB  can  be  interpreted  geometrically,  as  in  Figure  3.1  as  a  contraction 
of  the  information  ambiguity  to  find  the  bound  and  then  an  expansion.  The  column 
vectors  of  U(6 ),  being  in  the  null  space  of  the  Jacobian  of  the  constraints,  restrict 
the  information  in  0  into  the  constraint  space  0/.  The  bound  can  be  found  from 
(the  inverse  of)  the  information  in  0/,  and  is  then  projected  back  into  the  origi¬ 
nal  space  of  the  parameters  of  interest.  This  down-and-up  projection  is  easiest  to 
observe  for  linear  constraints,  where  any  local  reparameterization  is  also  a  global 
reparameterization. 

Example  3.20.  Assume  the  parameters  satisfy  the  linear  constraint 


f(0)  =  FO  +  v  =  0. 

In  this  case,  solutions  of  6  are  of  the  form  0  =  —  FT  (TFr)  1  v  +  U$,  =  g(£)  where 

11In  Ash’s  thesis  and  paper,  this  scenario  was  referred  to  as  the  minimally  constrained  case, 
where  only  all  the  unknown  information  ambiguities  are  constrained.  However,  in  this  current 
work,  the  minimal  number  of  constraints  is  zero,  the  degenerate  constraint  case  of  example  3.13, 
thus  it  makes  more  sense  to  refer  to  this  scenario  as  a  non-informative  constraint. 
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U  is  defined  as  in  (3.6).  That  this  is  a  one-to-one  correspondence  locally  (or  in 
this  case  globally)  is  made  clear  by  the  existence  of  the  inverse  g^id)  =  U1 0  = 
Then,  since  G(£)  =  U,  we  have  from  (3.9) 

CCRB(6>)  =  U  (t/TI(0)L/)_1  UT 
which  is  exactly  the  CCRB  provided  in  (3.5). 

Alternatively,  another  interpretation  [9]  is  that  the  CCRB  is  less  than  (in  a 
matrix  sense)  the  CRB  because  the  performance  bound  is  over  an  expanded  class  of 
estimators.  The  (non-biased)  CRB  is  a  bound  on  the  mean-square  error  of  unbiased 
estimators  in  0,  whereas  the  CCRB  is  a  bound  for  estimators  that  only  need  to  be 
unbiased  on  0/  and  not  on  the  whole  set  0. 

3.1.5  Derivation  of  g(£) 

In  the  proof  of  the  CCRB  as  well  as  its  application,  the  need  for  an  explicit 
reparameterization  go  proved  unnecessary.  However,  in  examples  3.4  and  3.20  of  the 
previous  section,  we  presented  scenarios  where,  given  the  constraint  function  /,  we 
were  able  to  define  a  locally  equivalent  continuously  differentiable  go-  There  may 
exist  other  scenarios  where  it  is  desirable  to  obtain  a  go  explicitly. 

In  this  section,  we  present  two  procedures  to  do  so.  First,  we  detail  an  approach 
using  the  Taylor  expansion  of  go  given  explicit  knowledge  of  U(0').  Then,  we 
demonstrate  an  approach  based  on  fixed  point  methods. 
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3. 1.5.1  A  Taylor  series  derivation 

The  Taylor  series  expansion  of  6  =  ge{£)  about  ^  is  given  by 

»«(O  =  e  +  Gs(0(e'-0  +  -” 

where  6  =  ge{$,)-  For  any  differentiable  function  h  :  Mm  ->Rwe  have 

Wh{e)  =  wfh(e) ' GeW’)' 

Since  the  implicit  function  from  Wn~k  into  0/  is  not  unique,  we  can  choose  a  repa¬ 
rameterization  which  uses  the  transformation  matrix  S(0)  =  Im-k •  he.,  we  choose 
a  reparameterization  with  Jacobian  Gg(£)  =  U(go(£))  for  any  null  space  matrix 
U(0)  that  satisfies  (3.6)  and  choose  $,  =  0.  With  this  selection  of  the  reparam¬ 
eterization,  the  rth  order  derivatives  of  go  are  the  (r  —  l)st  order  derivatives  of 
the  elements  of  U(0')  with  respect  to  £ ,  which  are  to  be  evaluated  at  0  for  the 
coefficients  in  the  Taylor  series. 


Example  3.21.  Reviewing  example  3.4,  note 


0  1 

-1  0 


and  hence 

du{o')  _  r  o  i' 

0B'T  ~  [-i  o_ 

is  independent  of  O' ,  so  using  (3.22)  as  a  reference,  we  have 


(3.22) 


drU(6') 

d£r 


dr  1  /  dU{0') 

dC-1  V  d0'T 
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With  a  natural  selection  of  0  =  go(0)  =  (1,0)T  as  an  initial  value  for  the  Taylor 
series,  the  reparameterized  function  is  found  to  be 


o' =  g{£) 


oo 


E 


(  dU{0') 

V  d6'T 


r— 0 


r 


E'-t 

r= 0 


£(-ir+1 


e,2r 

(2  r)! 

e'2r+1 


(2r  +  1)! 


’  cos(f')  ' 

-sin(^')_  ‘ 

This  particular  choice  of  6  produces  a  reparameterization  go  in  agreement  with  the 
one  chosen  in  (3.8).  Any  alternative  reparameterization  can  be  found  utilizing  an 
alternative  transformation  matrix  S(Q')  or  an  alternative  initialization. 


Such  an  approach  will  only  derive  a  local  bijective  map  and  convergence  of  the 
Taylor  series  may  not  result  in  a  known  functional  form  for  any  given  constraint  f. 
When  U(0')  is  not  known  as  a  function  of  6  ,  numerical  techniques  are  available  to 
find  the  derivatives  of  U(0')  with  respect  to  £  using  the  equation  F(6')U{6')  =  0. 


3. 1.5.2  A  fixed  point  derivation 


More  commonly,  a  fixed  point  approach  is  taken  to  the  derivation  of  an  implicit 


function.  As  in  appendix  A. 2,  the  parameter  vector  is  partitioned  0  = 


02 


and 


the  constraint  function  rewritten  as  f  : 


lm~k  x  Rk 


/( 


'01 

e' 


).  if (0i.02)  =  ^/*(0i,0; 


e2=e2 


Rk  defined  as  = 

is  nonsingular,  then  there  exist  a 
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unique  continuous  function  02  =  02  (0i)  about  6\  such  that  f*(0iM0i))  =  o. 
This  is  merely  a  particular  variation  of  the  implicit  function  theorem.  One  proof  of 
this  version  proves  the  existence  of  a  fixed  point  in  the  contraction  map 

d(02(0[))  =  e2(e[)-  D-1(e1,d2)f(e'1,  e^)) 

where  D(e1,e2)  =  K(e1,e2).  This  motivates  the  use  of  the  iteration 
o(2+1\e[)  =  ef(e\)  -  D-\e1.82)f(9l1.ef(e\)) 

to  generate  the  (fixed  point)  implicit  function.  The  iteration  is  essentially  an  appli¬ 
cation  of  Newton’s  method  [64], 

Example  3.22.  Reviewing  example  3.4  again,  we  have  the  constraint  f*(9i,62)  = 
9\  +  9\  —  1  =  0  and  wish  to  find  a  local  reparameterization  at  (9 1,  92)  =  (1,  0).  First 
note, 

fl(91,92)  =  291  =  2  ,  /£(0!,02)  =  202  =  O, 

and  we  only  have  a  nonsingularity  with  respect  to  9\ .  Hence,  we  cannot  find  a  find 
a  function  for  02  in  terms  of  9\  using  this  approach  at  (1,  0),  but  we  can  find  9'1(9'2) 
there.  Defining  D(9i,62)  =  (9\,  92)  =  2  and  initializing  with  9^1\9'2)  =  1,  the 

fixed  point  method  dictates  the  next  iterate  to  be 

e?\e'2)  =  #;<1)-d-1(»1,#2)((9;(,)(^))2  +  ^-1) 

Continuing,  the  third  iterate  is 
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and  the  fourth  iterate  is 


As  expected,  as  r  — >  oo  then  approaches  a  limit  function 


W2)  =  \/i  -o?  =  i-  J22  -  -JZ  -  -  — 0?  +  0(0 


16 


128 


near  02  =  0. 


3.2  Identifiability 

Identihability  conditions  based  on  the  CRB  (or  F1M)  were  detailed  in  Section 
2.2,  and  the  definitions  of  local  and  strong  identifiability  are  given  therein.  In  this 
section,  the  identifiability  of  parameters  under  functional  equality  constraints  is 
considered. 


3.2.1  Local  identifiability 

To  establish  a  new  identifiability  criterion  from  the  CCRB,  we  will  first  ex¬ 
amine  an  existing  criterion.  Rothenberg  [58,  theorem  6]  developed  conditions  for 
identifiability  of  a  parameter  vector  6  under  the  constraints  f(0)  =  0,  which  was 
later  partially  re-derived  by  Crowder  [18,  lemma  1  is  the  “only  if”  portion  of  the 
theorem  statement] . 

Theorem  3.23  (Rothenberg-Crowder).  Assume  both  F(G')  and 


have  constant  rank  in  a  local  neighborhood  about  0.  Then  0  is  locally  identifiable 
if  and  only  if  M(0 )  has  full  column  rank  m. 

The  proof  of  this  theorem  is  based  on  the  unconstrained  proof  of  Theorem 
2.5.  An  immediate  implication  is  that  if  the  FIM  1(0)  is  regular,  then  not  only  is 
the  unconstrained  model  locally  identifiable,  but  the  constrained  model  is  as  well. 
Otherwise,  if  the  FIM  is  singular,  then  the  constraints  must  be  such  that  the  row 
vectors  of  its  Jacobian  F(6)  eliminate  the  null  space  of  the  FIM.  In  doing  so,  the 
constraints  eliminate  whatever  inherent  ambiguity  was  in  the  model  that  led  to  local 
unidentifiability  and  a  singular  FIM  (Theorem  2.5).  If  row  vectors  of  the  Jacobian 
did  not  eliminate  the  null  space  of  the  Fisher  information  then  there  would  exist  a 
linear  combination  of  column  vectors  of  U (6)  such  that  U1  (0)1  (0)U (0)  is  singular, 
as  shall  be  shown  shortly. 

Additionally,  regardless  of  the  information,  in  the  trivial  case  where  the  con¬ 
straints  are  such  that  the  Jacobian  is  full  column  rank,  then  the  theorem  says  the 
model  is  locally  identifiable.  Indeed,  when  rank (F(0))  =  m  the  constraints  com¬ 
pletely  determine  the  parameter  (e.g.,  see  example  3.14). 

Rothenberg’s  theorem  for  unconstrained  identifiability  is  useful  in  establishing 
an  alternative  criterion  for  identifiability  in  relation  to  a  component  of  the  CCRB. 

Theorem  3.24.  Let  0  e  0/  and  assume  UT (O' )I(0' )U (O' )  has  constant  rank  in  a 
neighborhood  of  0.  Then  0  is  locally  identifiable  if  and  only  if  U 1  (0)1  (0)U (0)  is 
regular. 

Proof.  Let  go  satisfy  Theorem  3.3.  If  UT(0')I(0')U(0')  has  constant  rank  in  a 
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local  neighborhood  about  G  =  gg(£).  Then  I(£')  has  constant  rank  in  a  local 
neighborhood  about  And  since  go  is  injective,  0  is  locally  identifiable  in  0/  if 
and  only  if  £  is  locally  identifiable  in  Rm~k.  By  Theorem  2.5,  $,  is  locally  identifiable 
if  and  only  if  J(£)  is  regular.  And  J(£)  is  regular  if  and  only  if  U 1  ( G)I(G)U(G )  is 
regular.  □ 

This  is  the  corresponding  theorem  to  Rothenberg’s  Theorem  2.5  and  agrees 
with  his  Theorem  3.23,  although  the  proof  does  not  rely  on  this  latter  theorem’s  re¬ 
sult  because  the  implicit  function  go  simplifies  the  approach.  It  is,  however,  possible 
to  prove  a  more  general  result  connecting  the  rank  of  the  M(-)  matrix  of  Theorem 
3.23  to  the  implicit  Fisher  information  UT  (G)I(G)U  (G). 

Theorem  3.25.  Assume  U1  (G')I(G')U(G')  has  constant  rank  in  a  neighborhood 
of  G.  Then 

nullity(M(0'))  =  nullity(C/T(6>')J(6>')L/(6>')). 

Proof.  First,  for  some  fixed  G  ,  assume  M(G')  is  not  full  column  rank  and  the 
vectors  vi,...,vr  are  linearly  independent  and  span  the  null  space  of  M{0'),  i.e., 
for  l  —  1, . . . ,  r, 

I(G)vi  =  0 
F(G')Vl  =  0. 

Let  U ( G ’)  be  a  matrix  defined  as  in  (3.6).  Since  F(G')vi  =  0  then  each  Vi  =  U (O')wi 

r  r  r 

for  some  W\  , .  . . ,  tty  e  Now  if  =  0,  then  ^^7 1V1  =  U(G')'fiWi  = 

1=1  1=1  1=1 

0,  which  implies  71  =  •  •  •  =  yr  =  0  since  the  u*  are  linearly  independent.  Hence, 
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the  wi, . . .  ,wr  are  also  linearly  independent  and  span  an  r-dimensional  subspace 
of  Wn~k.  Thus,  I{0')U(0')wi  =  0  for  l  =  l,...,r.  This  proves  the  implication, 
namely,  that  nullity(E/T \0')I(0')U(0'))  >  nullity(iVf  (fl')). 

To  show  the  inverse,  assume  the  vectors  W\, . . . ,  wr  are  linearly  independent 
and  span  the  null  space  of  the  UT(d')I(6')U(6').  Define  Vi  =  U(d')wi  for  l  = 
1, . . .  ,  r.  Since  l(0')vi  =  0  and  F(0')vi  =  0,  then  M(d') V\  =  0  for  l  =  1, . . . ,  r. 

r  r  r 

Note  for  any  71, . . . ,  yr,  then  7;U;  =  U (6  )^iWi  =  0  if  and  only  if  7 1W1  =  0 

1=1  1=1  1=1 

(since  U{6')  is  full  column  rank)  which  is  true  if  and  only  if  7/  =  •  •  •  =  yr  =  0.  This 

proves  the  converse,  i.e. ,  nullity(lVf  (fl'))  >  iml\ity(UT(0')I(0')U(0')).  Q 

The  theorem  essentially  states  that  in  the  local  neighborhood  where  the  im¬ 
plicit  function  g  is  defined,  M ( 6 )  has  full  column  rank  if  and  only  if  UT (0)1  (0)U (0) 
does,  for  any  0  G  0/.  As  a  consequence,  we  have  the  following  corollary. 

Corollary  3.26.  If  1(6)  is  regular,  then  UT (0)1  (0)U (6)  is  also  regular.  And  if  6 
is  locally  identifiable  in  0,  then  6  is  locally  identifiable  in  @7 

Proof.  If  6  is  locally  identifiable,  then  nullity(J(0))  =  0  by  theorem  2.5.  And  if 
nullity(J(0))  =  0,  then  nullity (UT(0)I(0)U(6))  =  0  by  theorem  3.25.  Finally,  if 
nullity (Ur (6)I(6)U (6))  =  0,  then  6  is  locally  identifiable  in  0/  by  theorem  3.24. 

□ 
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3.2. 1.1  Local  identifiability  in  the  Aitchison-Silvey-Crowder  CCRB 
formula 

In  subsection  3. 1.2.2,  an  alternative  form  of  the  CCRB  is  presented  where  a 
loaded  Fisher  information  is  used  in  place  of  the  FIM  to  resolve  issues  of  singularity 
of  the  FIM.  The  Aitchison-Silvey  loaded  FIM  is  1(9)  +  FT  (9)F(9)  [4],  whereas  the 
Crowder  loaded  FIM  1(0 )  +  F7  (O)KF(O)  [18],  where  K  is  chosen  such  that  the 
loaded  FIM  is  full  rank.  The  following  theorem,  which  connects  local  identifiability 
and  the  Aitchison-Silvey-Crowder  CCRB,  shows  that  under  certain  conditions,  the 
matrix  K  is  unnecessary.  This  result  is  hinted  at,  but  not  clearly  stated  in  [18, 
lemma  6] . 

Theorem  3.27.  The  Aitchison-Silvey-Crowder  loaded  FIM  1(0)  +  FT (0)F(0)  is 
nonsingular  if  and  only  if  M(0)  is  full  column  rank.  Hence,  if  1(0  ')  +  FT(0')F(0') 
is  constant  locally  about  0 ,  then  0  is  identifiable  if  and  only  if  1(0)  +  FT (0)F(0) 
is  nonsingular. 

Proof.  To  show  the  contrapositive,  assume  M(0)  is  not  full  column  rank  and  v 
is  a  nontrivial  vector  such  that  M(0)v  =  0.  Then  l(0)v  =  0  and  F(0)v  =  0. 
Therefore,  v  is  in  the  null  space  of  the  Gram  matrix  1(0 )  +  F1  (0)F(0).  To  show 
the  inverse,  assume  ( 1(0 )  +  FT  (9)F(9)')  v  =  0  for  some  nontrivial  vector  v  G  Mm. 
Then  since 

m  +  ftwf(0)  =  [p'2(g)  FT(e>]  ■  , 

we  must  have  that  F^2(0)v  =  0  and  F(0)v  =  0.  Hence,  l(0)v  =  F^2(0)F^2(0)v  = 
0,  which  implies  M(0)v  =  0  and  M(0)  is  not  full  column  rank.  '  ! 
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3.2.2  Strong  Identifiability 


Restricting  the  discussion  to  normal  distributions  in  this  section  we  can  extend 
the  equivalence  criterion  between  regularity  and  strong  identifiability  as  defined  in 
section  2.2.2.  That  is,  assume  x  ~  A /”(/x(0),  X(0))  with  p{6)  e  Mp,  with  the 
elements  of  the  mean  and  variance  explicitly  defined  by  a  map  ip  :  0  — >  M9  where 
q  <  p  +  p[p  +  l)/2  and  assume  m  <  q. 

Theorem  3.28.  If  6  is  strongly  identifiable  on  0  and  f{0)  =  0,  then  6  is  also 
strongly  identifiable  on  0/. 

Proof.  If  6  is  strongly  identifiable,  then  there  exists  a  representative  mapping  tp* , 
which  is  injective  on  0.  Since  0f  C  0,  is  still  injective  and  a  representative 
mapping  on  0j.  □ 

This  theorem  is  complementary  to  Corollary  3.26.  Essentially,  the  imposition 
of  constraints  does  not  take  away  existing  identifiability  (local  or  strong)  or  Fisher 
information  regularity  that  already  exists  in  a  model.  However,  it  is  not  always  the 
case  that  the  original  (unconstrained)  model  is  information  regular  or  identifiable. 
The  following  theorem,  an  extension  of  Theorem  2.6,  connects  the  notion  of  strong 
identifiability  with  regularity  of  UT{0)I{0)U{6). 

Theorem  3.29.  Assume  p  is  a  holomorphic  mapping  of  z  6  fi  C  Ua&A^a  into  C9, 
where  0f  cfic  Cm  and  0Q  is  open  in  Cm  for  each  a.  Then 

(a)  if  UT(z)I(z)U(z)  is  regular,  there  exists  a  strongly  identifiable  open  neigh¬ 
borhood  about  z,  and 
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(b)  if  there  exists  a  representative  mapping  ip*a  :  — »  C9  for  each  a,  then  the 

matrix  U 1  (. z)I(z)U(z )  is  regular  for  every  z  EVt. 

Proof.  By  the  implicit  function  theorem  (Theorem  3.3),  then  for  any  matrix  U(0) 
whose  columns  form  a  basis  for  the  null  space  of  the  Jacobian  of  f(0),  there  exists 
an  open  set  V  9  0,  an  open  set  W  C  Wn~k,  and  some  transformation  ge  :  W  — >  Mm 
such  that  0  =  go(£)  for  some  £  E  W.  In  particular,  in  this  reduced  parameter  space, 
there  exists  a  FIM  such  that  /(£)  =  UT (6)I(6)U (6).  Since  Theorem  2.6  applies 
to  /(£),  the  result  for  UT(0)I(0)U(0 )  is  proven.  |  | 

Regardless  of  the  regularity  of  the  FIM,  only  regularity  of  UT (0)1(0)17(0) 
determines  strict  identihability  under  constraints  for  normal  distributions,  given  a 
proper  holomorphic  function(s). 

3.3  Linear  Model 

Assume  the  observations  x  model  a  linear  function  of  the  parameters  0,  as  in 

x  =  H0  +  w,  (3.23) 

where  H  is  a  full  column  rank  nxm  observation  matrix  consisting  of  known  elements 
and  w  is  a  random  noise  vector  with  mean  zero  and  known  variance  C.  As  noted 
in  section  2.3.1,  the  (weighted)  LSE 

0LS(*)  =  (ifTC-1ff)"1iITC-1: r 
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is  the  BLUE.  The  LSE  has  a  variance  of  Q  1  where  Q  =  (ETC  1  if).  In  addition, 


we  assume  a  linear  constraint 


f(d)  =  fg  +  v  =  o, 

where  F  is  a  known  full  row  rank  k  x  m  projection  matrix  and  v  is  a  known  shift 
vector.  For  a  linear  constraint,  the  Jacobian  F(0)  =  F  does  not  depend  on  the 
parameter.  As  the  linear  problem  is  well-studied,  many  of  the  results  in  this  section 
are  known  (e.g.,  see  [61,  section  3.8]  and  [57,  section  11.3.3])  but  are  presented  here 
from  a  different  perspective. 

3.3.1  Best  Linear  Unbiased  Estimation 

The  constrained  (weighted)  LSE  (CLSE)  is  most  often  given  by  [36,  p.  252] 

0CLs(*)  =  M*)  -  Q1Ft  (FQ^F^1  (. FGls(x )  +  v)  .  (3.24) 

Simple  calculation  confirms  that  this  CLSE  exists  in  ker(/)  =  ©/  (i.e. ,  it  satisfies 
the  constraint),  is  unbiased,  and  has  variance 

Var(dcLsUO)  =  Q-1-Q~1Ft(FQ-1Ft)~1FQ-1.  (3.25) 

The  BLLIE  property  of  the  LSE  is  preserved  for  the  CLSE.  This  is  not  a  surprising 
result  since  a  linear  constraint  on  a  linear  model  is  still  a  linear  model,  as  seen  in 
figure  3.2. 

In  the  context  of  the  null  space  approach  of  the  CCRB,  an  alternative  CLSE 
may  be  developed.  One  particular  advantage  of  this  approach  is  the  avoidance  of 
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X 


Figure  3.2:  Projection  of  the  observations  x  onto  the  linear  space  HQ  and  the  linear 
constraint  space  HQf. 

the  need  for  directly  using  and  solving  for  Lagrange  multipliers.  Note  that  zero 
solutions  of  f(0)  are  of  the  form 

6  =  -FT  (FFTy1v  +  U£=ge(£) 

where  U  satisfies  the  equations  in  (3.6),  i.e. ,  the  columns  of  U  form  an  orthonormal 
basis  for  the  null  space  of  the  row  vectors  of  F,  and  £  G  is  a  parameter 

representing  the  projection  of  6  to  the  constraint  space  Qf.  Since  the  Jacobian  F  is 
independent  of  the  parameter,  then  U ( 6 )  =  U  and,  hence,  gg  are  as  well.  Moreover, 
the  typical  local  properties  for  the  implicit  function  hold  globally.  Substituting  this 
solution  for  6  the  linear  model  is  reformulated 

y  =  HU  $,  +  w  (3.26) 

where  y  =  x  +  HFT  ( FF 1 )  1  v.  Following  the  least  squares  result  for  the  CRB, 
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we  desire  a  solution  that  minimizes  a  quadratic  objective  function 


£ls  =  arg  min  (y  -  HUg,)1  C  1  (y  -  HU £)  . 

This  solution  must  satisfy  the  normal  equations  given  by 

(UTQU)  £hS(y)  =  UTHTC  ly. 

Provided  the  normalizing  matrix  (UTQU )  is  full  rank,  the  LSE  of  $,  is  given  by 

£ls(v)  =  [UTQUY'^^C-'y 

and  is  the  BLUE  ((2.4)  in  section  2.3.1).  The  corresponding  LSE  of  6  based  on  this 
null  space  approach  is 

Ocls(x)  =  -FT(FFTy1v  +  u£hS(y) 

=  - Ft  (FFt)  ~\  +  U  ( UTQU ) _1  UtHtC  1  (x  +  HFt  (FFt)  ~ 1  u) 
=  U  ( UTQU ) _1  UtQ  (Ox  +  Ft  (FFt)  _1  v)  -  FT  (FFt)  _1  n 
+U  (C/TQC/)_1  UtHtC  1  (x  -  HGx ) 

=  +  (x-Hdx)  (3.27) 

where  6>i  =  -FT  (FFr)^  u  +  C7^i  can  be  any  arbitrary  point  satisfying  the  linear 
constraint  (£i  is  unrestricted).  This  alternative  CLSE  is  more  general  than  the  prior 
formula  as  it  is  applicable  in  scenarios  when  H  and  F  are  not  necessarily  full  column 
rank  and  full  row  rank,  respectively.  As  such,  it  has  the  more  general  expression  for 
its  variance 

u  ( utqu)~1ut , 
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which  is  equivalent  to  (3.25)  when  H  and  F  are  full  column  rank  and  full  row  rank, 
respectively.  Replacing  the  rank  conditions  on  H  and  F,  the  weaker  necessary 
conditions  for  this  CLSE  are  that  HU  be  full  column  rank.  If  the  stronger  necessary 
conditions  on  H  and  F  exist,  then  the  CLSE  can  be  reformulated  in  terms  of  the 
LSE  as  before,  e.g., 

0cls(*)  =  01  +  U  {UTQU)~1  UTQ  (bls(x)  -  6b)  . 

It  is  easy  to  confirm  that  0cls(*)  £  ker(/)  by  recalling  that  FU  =  0  and  6\  € 
ker(/). 

Also,  as  in  section  2.3.1,  if  HU  is  not  full  column  rank,  then  for  estimable 
functions  d1  6,  i.e.,  for  vectors  d  in  the  column  space  of  U1  HT,  the  LSE  is  BLUE 
and  is  given  by 

dT9CLs(x )  =  dT0 1  +  dTU  (UTQUy  UTHTC~X  (x  -  HO (3.28) 
similar  to  (3.27)  with  variance  dTU  (C/TQC/)^  UTd. 

3.3.2  Uniform  Minimum  Variance  Estimation  under  Gaussian  noise 

Linder  the  assumption  that  the  noise  is  normally  distributed,  the  (uncon¬ 
strained)  LSE  is  also  the  MLE.  The  FIM12  is  1(0)  =  Q  =  (^C^H),  and  the 
LSE/MLE  is  the  MVUE  being  efficient  with  respect  to  CRB  =  J_1(0).  Given  this 
general  principle  that  for  linear  models  with  additive  Gaussian  noise,  the  LSE  is 

the  MLE,  then  since  a  linear  constraint  is  essentially  a  reduced  dimensional  linear 

12Note  that  neither  the  Fisher  information,  the  Jacobian  of  the  constraints,  nor  the  null  space 
matrix  depend  on  the  parameters  in  the  linear  model  with  linear  constraints. 
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model  as  evidenced  in  (3.26),  the  CLSE  should  be  the  constrained  MLE  (CMLE). 
As  we  shall  see,  this  is  indeed  the  case. 

First,  for  the  Gaussian  linear  model  in  (3.26),  the  pdf  is 

l(y- S)  =  (2;r)(m-W3(detC)C-*>/3  exp  {-!  (w  -  HC/«)T  C-1  (v  -  HUS) }  ■ 

The  Fisher  score  s(y;£)  =  -UT HT C~l  (y  -  HU g)  has  a  variance,  or  Fisher 
information,  of  /(£)  =  UTHTC~1HU .  Hence  the  CCRB  is 

CCRB(0)  =  G(£)/-1(£)GT(£) 

=  U(UtHtC-1HU)-1Ut 

(see  example  3.20). 

Maximizing  the  likelihood  (or  pdf)  is  equivalent  to  minimizing  the  quadratic, 
therefore  the  LSE  of  $,  is  also  the  MLE  of  And  by  the  invariance  property  [36,  14] 
of  the  MLE,  then  the  CMLE  of  6  is 

Ocml(x)  =  g{£Mh(y))  =  Ocls(x). 

Theorem  3.30.  The  CMLE  is  optimal  for  the  linear  model  under  linear  constraints. 
That  is,  if  the  observations  obey  the  linear  model  in  (3.23),  where  H  is  a  known 
matrix,  6  is  an  unknown  parameter  vector  subject  to  the  linear  constraint  f(6 )  = 
FO  +  v  =  0,  and  w  is  a  zero-mean  normal  random  vector  with  known  variance  C, 
then  provided  HU  is  full  column  rank,  where  U  is  defined  by  (3.6),  the  CMLE 

6>cml (*)  =  0i  +  U  (UTHTC^HU) _1  UTHTC  l  (x  -  HO i)  (3.29) 

is  unbiased  and  efficient. 
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While  the  formula  for  the  CMLE  based  on  the  MLE  might  seem  preferable, 


analogous  to  the  formula  for  the  CLSE  based  on  the  LSE,  i.e., 

Ocml(x)  =  G1  +  U  (Ut QU)~ 1  UtQ  (eMh(x)  -  0x)  ,  (3.30) 

with  Q  =  HT  C  1H,  this  formulation  requires  the  existence  of  a  full  rank  FIM 
1(G)  =  Q  in  the  MLE.  The  benefit  of  using  the  CMLE  in  (3.29)  versus  the  CMLE 
in  (3.30)  is  that  the  following  proof  does  not  require  this  assumption. 

Proof.  First  note  for  any  G.  G\  satisfying  the  constraints 

0-B1  =  -Ft  (FFt)-'  v  +  U£  +  Ft  (FFt)-'  v  -U(i 

=  n  («-{,) 

for  some  G  Therefore,  the  expected  value  of  the  CMLE  is 

EoGCml(x)  =  Q1  +  U  (UTQUY1UTQ(0-01) 

=  Gx  +  U  ( UTQU ) _1  UtQU  (|  -  &) 

=  G1  +  U^-^) 

=  G1  +  G-G1 
=  G. 
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Finally,  the  variance  of  the  CMLE  is 


Vare(0CML(*))  =  yare[e1  +  U(UTQU)~1UTHTC-1(x-He1)^ 

=  Vare  (u  {UTQU)~1  UTHTC  1  (x  —  HG  +  HG  —  HG 
=  Vare  (u  (UTQUyl  UTHTC  1  (x  -  HG)^j 
=  U  ( UTQU ) _1  UtHtC~1CC~1HU  ( UTQU ) _1  C/T 

=  u  (utqu)  _1  utqu  (utqu)  _1  c/T 

=  u^Quy1^, 

i.e.,  the  CCRB  of  G.  □ 

Thus,  the  CMLE  is  the  MVUE  for  the  linear  model  with  linear  constraints 
under  a  Gaussian  assumption. 

Additionally,  when  HU  is  not  full  column  rank,  then  when  d  is  in  the  column 
space  of  U1  HT,  the  MLE  of  dTG  is  still  the  MVUE  and  is  given  by  dTGcML(x)  = 
dT0cLs(a;)  from  (3.28),  with  variance  dTCCRB(G)d. 

3.4  Constrained  Maximum  Likelihood  Estimation 

The  constrained  MLE  (CMLE)  of  the  parameter  vector  G  constrained  to  the 
manifold  Of  —  {G'  :  f(G')  =  0}  is  the  estimator  in  0/,  which  given  the  observa¬ 
tions  x,  that  maximizes  the  likelihood  distribution  p(x\  •),  he.  it  is  the  maximum 
likelihood  in  0/.  Since  log(-)  is  concave,  it  is  convenient  to  equivalently  maximize 
the  log-likelihood  logp(a;;-)  since  then  the  Jacobian  of  the  objective  is  the  Fisher 
score.  In  an  optimization  context,  the  CMLE,  which  will  be  denoted  #cml(*),  is 
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the  solution  to  the  following  constrained  optimization  problem: 


max  log  p(x\9  ) 

o'  (3.31) 

s.t.  f(0')  =  0. 

Analogous  to  the  method  of  maximum  likelihood  approach  of  (2.7),  solutions  6{x)  = 
0e(*)  (£(*))  satisfying 


d 

w 


log  p(x,g9'  (£')) 


€'=€(*) 


0, 


where  gg>  is  defined  by  (3.7),  are  candidates  to  be  the  CMLE.  More  formally,  a  solu¬ 
tion  to  this  optimization  problem  must  satisfy  the  Karush-Kuhn- Tucker  conditions 
[45],  i.e., 


s(x-d')-\TF{e') 

f(0') 


0  (stationarity) 
0.  (feasibility) 


(3.32) 

(3.33) 


Any  point  satisfying  these  conditions  is  a  stationary,  feasible  point. 

Since  #cml(#)  €  ©/,  then  the  implicit  function  theorem  implies  there  exists 
an  open  set  O  C  0/  containing  ^cml(*),  an  open  set  P  C  and  a  continuously 

differentiable  bijection  g0^  :  P  — >  O  such  that  0(x)  =  gg^($,(x))  for  some  £(x)  G 
P.  If  0(x)  is  a  maximizer  of  the  likelihood  function  p(x ;  6)  in  the  constraint  set  0/, 
then  the  likelihood  q{x ;  t ;  )  =  p{x ;  gg(£'))  has  a  maximum  at  £(x)  in  P  (i.e.,  a  local 
maximum  at  £(a:)  in  Wn~k).  It  cannot  be  said  that  q{x\  ^)  has  a  global  maximum  at 
^(a:),  since  6  =  </e(^^(£,)  is  only  guaranteed  to  exist  in  0/  when  G  P,  i.e.,  there 
may  exist  a  point  £  G  Mm-fc\P  such  that  q(x]£)  >  q(x]$,(x))  and  g0^{^')  ^  ©/. 
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3.4.1  Efficient  estimation 


In  Section  2.4.1,  it  was  explained  that  when  an  efficient  estimator  exists,  the 
method  of  maximum  likelihood  finds  the  estimator  [36,  62].  It  is  useful  to  note  the 
connection  between  efficiency  and  the  method  of  constrained  maximum  likelihood 
since  Stoica  and  Ng  ignored  this  extension  in  their  paper  [68],  despite  Marzetta 
having  showed  that  this  result  extends  to  the  constrained  case  when  the  FIM  is 
non-singular  [47,  theorem  3].  What  follows  is  the  general  extension  of  this  result, 
including  the  case  for  singular  FIMs. 

Theorem  3.31.  If  t(x)  is  a  constrained  estimator  of  6,  required  to  satisfy  the 
constraint  =  0,  which  is  also  efficient  with  respect  to  the  CCRB,  then  the 

estimator  is  a  stationary  point  for  the  constrained  optimization  problem  in  (3.31). 

Proof.  This  is  perhaps  more  easily  proven  strictly  from  the  constrained  parameter 
perspective,  since  the  global  maximum  of  the  likelihood  relative  to  the  implicit 
reparameterization  may  not  correspond  to  global  maximum  in  0  f  relative  to  the 
constrained  parameterization.  Since  t(x)  is  efficient  then  in  the  mean-square  sense 
we  have  t(x )  —  6  =  CCRB(0)s(a?;  6)  as  a  function  of  6.  Then  as  6  — ►  t{x)  (this 
assumes  the  observations  are  consistent  with  Q)  we  have 

0  <-  t(x)  -0  =  CCRB(0)s(cc;  Q). 

The  continuity  of  the  CCRB  and  the  Fisher  score  implies  s(x;  t(x))  =  FT(t(x))  ■  A 
for  some  A  G  or 

s(x ;  t(x))  —  X1  F(t(x))  =  0, 
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which  defines  the  stationarity  condition  (3.32)  of  the  constrained  optimization  prob¬ 
lem  with  A  being  the  vector  of  Lagrange  multipliers.  □ 

3.4.2  Asymptotic  Normality 

The  asymptotic  properties  of  the  MLE  can  be  found  in  section  2.4.2.  Therein, 
it  was  mentioned  that  the  maximum  likelihood  estimator  was  asymptotically  un¬ 
biased  and  efficient  with  variance  asymptotically  equivalent  to  the  CRB.  A  corre¬ 
sponding  relationship  exists  between  the  CMLE  and  the  CCRB.  As  before,  let  the 
samples  aq,  aq, . . . ,  xn,  be  iid  as  x  from  the  likelihood  p(x]  G),  where  G  is  assumed 
to  exist  in  0/.  Denote  yn  =  (aq,  aq, . . . ,  xn)  to  be  the  collection  of  these  samples, 

n 

so  that  the  likelihood  will  be  p(yn;  G)  =  J^Jp(aq;  G).  Hence,  the  asymptotic  CMLE 

i= 1 

will  be  denoted  G(yn). 

Theorem  3.32.  Assuming  the  pdf  satisfies  the  regularity  conditions  (see  (3.19)  and 
discussion  after  proof),  then  the  CMLE  is  asymptotically  distributed  according  to 

A n  (e(yn)  -  e)  4  u  (o,  ccrb(0))  . 

There  exists  a  number  of  results  in  the  literature  regarding  the  asymptotic 
characteristics  of  the  CMLE  (e.g.,  the  works  of  Aitchison  and  Silvey  [3,  63,  4,  2], 
and  of  Crowder  [18]).  For  example,  Crowder  shows  that 

Vn  (d{yn)  -  4 

J\f  (o,  D~\G)  -  D-\G)Ft(G )  (F(G)D-1(G)Ft(G))~1  F(G)D~1(G )) 

where  D(G)  =  1(G)  +  FT (G)KF(G)  for  any  positive  semi-definite  matrix  K  that 
ensures  the  nonsingularity  of  D(G ).  And  while  it  would  be  sufficient  to  use  these 
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existing  results  to  verify  the  connection  between  the  CMLE  and  the  CCRB,  it  is 
also  insightful  (and  the  point  of  this  treatise)  to  examine  the  problem  entirely  from 
the  perspective  of  the  reduced  parameter  space,  i.e.,  using  the  implicit  function  or 
a  null  space  approach.13 

Proof.  By  the  implicit  function  theorem  (Theorem  3.3),  there  exists  an  open  set 
O  C  0/  containing  0,  an  open  set  P  C  and  a  continuously  differentiable 

bijection  ge  :  P  — »  O  such  that  6  =  go{$,)  for  some  ^  6  P.  The  likelihood  for  £  is 
given  by  q(yn:  £)  =  p(yn-,ge(£)). 

Let  £(yn)  be  the  MLE  of  £  based  on  the  likelihood  q(yn ;£)•  Since  the  MLE 
is  consistent  and  asymptotically  efficient,  then 

(«(».)-«)  4 -V  (0 ,/-'(€)) .  (3.34) 

In  particular,  since  £(yn)  — >  £  as  n  — >  oo,  then  for  n  sufficiently  large,  say  n  >  N , 
i{Vn)  E  P.  Let  0(Vn)  be  the  CMLE  of  6  based  on  the  likelihood  p(yn',0)  and  the 
constraint  fie)  =  o.  By  the  invariance  property  [36,  62],  for  n  >  N,  0(yn)  = 
goitiVn))  and  0(yn)  £  O.  Therefore,  since  £{yn)  — >•  ^  as  n  — >  oo,  0(yn)  0  also 

and  the  CMLE  is  consistent. 

The  Taylor  series  expansion  (see  section  3.1.5)  of  go{€')  can  be  truncated  using 
a  Lagrange  remainder  term  [38,  p.  232]  as  in 

9o(i(yn))  =  geiZ)  +  Ggizdivn))  ( £{yn )  -  €) 

where  Ge(£,£(yn))  is  meant  to  represent  a  matrix  of  the  form  of  Gg(£)  where  each 

13Nevertheless,  a  proof  directly  from  Crowder’s  asymptotic  normality  result  is  detailed  in  ap¬ 
pendix  A.  3. 
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row  is  evaluated  at  possibly  different  points  £  i  —  1, . . . ,  m  —  k,  each  existing  on 
the  line  segment  starting  at  ^  and  ending  with  £(yn).  From  the  invariance  property 
of  the  MLE,  this  can  be  rewritten  as 

9(Vn)  -0  =  G„(£, «(»»))  (i(yn)  -  t)  . 

Since  the  MLE  £(yn)  is  consistent,  then  Gg(£,  £(yn))  Gg(£).  Given  this  and 

(3.34),  then  by  Slutsky’s  theorem  [62,  p.  60] 

vV  (e(Vn)  -o)±M  (0,  . 

which  by  theorem  3.5  shows  0(yn)  is  asymptotically  efficient  with  respect  to  the 

CCRB.  □ 

The  conditions  for  asymptotic  normality  with  respect  to  the  CCRB  are  the 
conditions  that  £(yn)  be  asymptotically  normal  [14,  p.  516].  For  the  MLE  these 
include  (a)  differentiability  of  the  Fisher  score,  (b)  the  Fisher  information  continuous 
with  respect  to  the  parameter  and  nonzero  at  £,  and  (c)  consistency.  For  the  CMLE 
and  theorem  3.32,  this  translates  to  (a)  differentiability  of  the  Fisher  score  and  the 
existence  of  first  and  second  derivatives  of  any  implicit  function  (or  equivalently,  the 
constraint  /),  (b)  UT(d')I(0')U(0')  continuous  with  respect  to  6  and  regular  at 
6  =  0,  and  (c)  consistency  of  the  CMLE. 

3.4.3  The  Method  of  Scoring  Under  Parametric  Constraints 

The  method  of  scoring  for  unconstrained  parameters  is  detailed  in  section  2.4.3. 
Here,  we  examine  scoring  with  constraints.  Assume  we  have  an  iterate  6 ^  €0/  and 
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we  wish  to  improve  this  iterate  in  the  sense  of  the  optimization  problem  expressed 
in  (3.31).  The  method  of  scoring  does  not  directly  apply,  since  any  projection  step 
will  not  take  into  account  the  constraint,  i.e. ,  it  is  likely  the  direction  of  steepest 
ascent  is  not  the  appropriate  path  in  terms  of  maximizing  the  likelihood  subject  to 
the  functional  equality  constraints.  Thus,  it  is  desirable  to  have  projected  direction 
and  restoration  steps  that  take  the  constraints  into  consideration. 

Given  an  initial  estimate  6^k\  there  exists  a  set  O  3  9(1'1  open  in  ©/,  a  set  P 
open  in  Wn~k,  and  a  continuously  differentiable  function  g0(k)  :  Mm-fc  — >  such 
that  g0(k)  is  a  diffeomorphism  on  P,  g^k){ P)  =  O,  and  in  particular  there  exists  a 
^(k)  £  p  suc]1  that  g^k)  (£(fc))  =  Q(k\  Scoring  can  now  be  applied  in  the  reduced 
parameter  space  of  Wn~k. 

For  the  given  set  of  observations  x  and  this  corresponding  initial  estimate  $,(k\ 
the  method  of  scoring  suggests  the  projection  step 

4(fc+ 1)  =  4(fc)  +  I~\i{k))s(x;^k)) 

to  generate  a  better  estimate  in  the  sense  of  maximizing  the  likelihood  q(x; 

=  p(x ;  g0(k)  (^,)).  As  with  many  iterative  procedures,  convergence  is  only  guaranteed 
under  certain  initial  conditions.  If  the  projection  step  or  shift  is  too  large,  then 
may  not  be  a  usable  point,  i.e.,  an  iterate  that  increases  the  value  of  the  likelihood 
function.  To  add  stability  to  the  procedure,  often  a  step-size  rule  or  shift-cutting 
is  employed.  This  amounts  to  the  inclusion  of  a  multiplicative  factor  a ^  G  [0, 1], 
modifying  the  projection  step  to 

i(fc+i)  =  |(fc)  +  /)r1(^))s(x;^)). 
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Choosing  an  appropriate  step-size  rule  for  will  guarantee  convergence,  although 
typically  at  a  cost  to  the  rate  of  convergence. 


The  Taylor  series  expansion  (see  section  3.1.5)  of  g^k)  about  £,l'k>  and  evaluated 
at  js  given  by 

9s(««,t+1>)  =  9ew(i[t))  +  '  «“+1)  -  «(‘>)  +  o(IIC'‘+1)  -  d‘’ll) 

where  o(||^(fc+1^  —  £(fc)||)  is  a  term  that  shrinks  faster  than  ||^C+1)  _  ^(fc)j|  as  k  — > 
oo.  Ignoring  this  error  term,  this  generates  an  iteration  in  the  larger  dimensional 
parameter  space  0  C  Mm  by  defining  the  next  iterate  00+1)  —  g^k)  (£0+!)).  That 
is, 

0{k+1)  =  g0^(i{k))  +  Ge^{k))-U{k+1)~^ 

=  0 «  +  a^G9{k)  (i{k))(GTe(k)  (*«)  J(0M)G*„  (*«))  (£(fe))s(z;  0(fc)) 

=  +  a(fc)c/(0(fc))  {uT{o^)i{e^)u{e^)Yl  uT{^k))s{x-,  e (fc)). 

In  comparison  with  the  classical  method  of  scoring,  this  iteration 

Q{k+1)  =  Q{k )  +  a0)cCRB(0(fc))s(a;;  0(fc))  (3.35) 


is  essentially  a  replacement  of  the  CRB  with  the  CCRB.  This  is  the  projection  step 
of  the  method  of  scoring  with  parametric  equality  constraints.14  Intuitively,  this 

should  seem  appropriate  since  the  CCRB  is  a  generalization  of  the  CRB.  However, 

14Osborne  [55]  used  a  Lagrangian  multiplier  approach  to  develop  the  method  of  scoring.  But  his 
scenario  was  restricted  to  linear  constraints  and,  hence,  lacked  the  restoration  step.  Additionally,  he 
makes  no  mention  that  the  matrix  projecting  the  negative  Jacobian  of  the  objective  is  a  constrained 
Cramer-Rao  bound.  Note  the  structure  of  this  projection  step  is  well-known  as  a  nonstatistical 
formulation  exists  for  the  conventional  optimization  problem  in  [25,  p.  178]. 
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even  with  an  appropriate  step-size  rule  to  generate  usable  iterates,  since  there  is  no 
certainty  that  then  it  is  likely  that  will  not  be  a  feasible  point. 

To  correct  this,  an  encompassing  restoration  step  is  required  to  produce  the  next 
iterate,  i.e., 

0(fc+1)  =  77  [0«  +  a^CCRB(G^)s(x-,  G^)]  (3.36) 

where  7r  [•]  is  the  natural  projection  of  Mm  onto  0/-.  This  is  the  method  of  scoring 
with  parametric  equality  constraints.  With  this  additional  restoration  step,  the 

e(k+1)  =  jt[e(k)  +  «(k)  ccRB(e(k))  J(x;e(k>)] 

level  surfaces  in  the 


Figure  3.3:  Path  created  by  iterates  from  the  method  of  scoring  with  constraints. 

usability  of  the  iterate  would  be  tested  and  accepted  or  rejected  after  (3.36)  as 
opposed  to  after  (3.35).  For  a  convex  set,  the  natural  projection  is  well-defined.  In 
general,  though,  some  other  rule  will  likely  need  to  be  applied  in  cases  for  which 
there  does  not  exist  a  unique  shortest  distance  to  0/,  e.g.,  reducing  the  step  size 
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c3fcb  Simple  projections,  e.g.,  onto  planar  or  spherical  constraints,  are  relatively 
simple  operations,  however,  it  might  be  more  commonly  the  case  that  the  restoration 
cannot  be  expressed  analytically.  To  ensure  the  iterates  satisfy  the  constraints 
approximately,  one  approach  is  to  apply  an  additional  iterative  process  [25] 

Q{k,l+l)  =  0{k,l)  _  pT^Q{k,l)^  (F^{k^FT /(0(fc’Z)), 

where  =  #A)  and  n  =  0(fc>O  when  f(6('k,l'> )  k  0  to  a  desired 

degree  of  accuracy  and  provided  the  iterate  is  still  usable.  Alternatively,  a  penalty 
can  be  added  to  the  cost  (objective)  function,  e.g.,  as  in 

k 

\ogp{x-,9')  +  r)Y^  \fi(0')\ 

i= 1 

for  some  positive  rj  and  where  /,  is  the  ith  constraint  equation,  to  limit  the  divergence 
of  the  iterations  away  from  Qf. 

3.4.3. 1  Convergence  Properties 

There  is  a  large  class  of  conditions  that  guarantee  convergence  in  fixed  point 
theorems,  some  of  which  can  be  found  in  [64,  25,  10].  The  most  general  statement 
is  that  given  an  initialization  “sufficiently  close”  to  the  maximum  value  0(x),  the 
sequence  generated  by  the  algorithm  will  converge  to  this  CMLE.  As  the 

method  of  scoring  with  parametric  equality  constraints  is  a  Newton-type  method, 
convergence  properties  that  already  exist  for  these  methods  can  be  adapted  here. 
As  it  is  impossible  to  cover  all  the  potential  approaches  to  developing  properties  for 
this  constrained  scoring  algorithm,  this  section  focuses  on  properties  similar  to  those 


75 


found  in  Goldstein  [22],  To  ease  reading  of  this  section,  the  proofs  of  the  theorems 
in  this  section  are  presented  in  appendix  B. 

First,  define  0g(fc)  =  {O'  E  0/  :  p(x;  6  )  >  p(x:  as  the  set  of  all  feasible 

and  usable  iterates  after  the  kth  iterate  6(k\  The  step  rule  for  the  properties  in  this 
section  is  as  follows:  for  a  fixed  (5  E  (0, 1)  choose  the  least  positive  integer  rril'k>  such 
that  a ^  =  /3m{k)  satisfies  the  inequality 


a ^  (logp(a:;  _  log p(x\  6^))  >  k 


Q(k+1)  _  g(k) 


/(eW) 


(3.37) 


where  is  defined  by  (3.36).  If  no  finite  exists,  then  choose  a ^  =  0.  This 

type  of  step-size  rule  enforces  a  stepwise  Lipschitz  condition.  For  theorems  3.33  and 
3.37,  we  require  that  0/  be  convex.15 


Theorem  3.33.  If  for  any  iterate  6iki  E  0/  there  does  not  exist  an  a®  >  0  that 
satisfies  (3.37),  then  6^  is  a  stationary  point. 


Therefore,  when  the  step  rule  forces  the  choice  of  =  0  then  the  method 
of  scoring  with  parametric  equality  constraints  has  converged.  The  next  theorem 
details  a  property  on  the  sequence  of  likelihood  functions  generated  by  the  iterates. 


Theorem  3.34.  The  sequence  { p(x ;  0!ky)}  is  a  monotone  increasing  sequence.  Fur¬ 
thermore,  if  p(x;  •)  is  bounded  above,  then  {p(x;  0^)}  converges. 

15For  any  nonlinear  equality  constraint,  0/  will  not  be  convex.  However,  locally,  the  restoration 
of  the  linear  projection  onto  the  tangent  space  of  the  CCRB  has  the  appearance  of  the  restoration 
onto  a  convex  set  for  sufficiently  small  a^k\  i.e.,  0 /  appears  locally  convex.  While  it  may  be  possible 
to  restrict  the  step  size  to  this  local  convexity,  such  an  enhancement  is  beyond  the  scope  of  the 
work  presented  here.  For  inequality  constraints,  0/  may  be  convex;  although  such  constraints,  as 
in  example  3.15,  will  not  inform  the  projection  update  in  (3.35),  they  might  inform  the  restoration 
update  in  (3.36). 
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Thus,  Off(k)  is  a  decreasing  sequence  of  closed  (nested)  sets,  i.e. ,  0g(fc+i)  C  Ogw 
or  for  any  given  sequence,  6^  e  Qgu>  provided  i  >  j.  That  is,  using  a  proper  step 
size  rule  will  guarantee  usable  iterates.  The  monotonicity  of  { p{x ;  6^)},  even  if 
bounded  above,  does  not  imply  monotonicity  in  the  sequence  {logp(ay  — 

log p(x;  d^)} .  However,  this  does  guarantee  convergence. 

Theorem  3.35.  If  the  likelihood  p(x;  •)  is  bounded  above,  then  the  sequence 

{log p(x]  Q(k+l'>)  —  log p(x; 

vanishes. 


Hence,  a  bounded  likelihood  function  guarantees  the  existence  of  a  maximum 
likelihood  solution(s).  This  can  also  be  shown  by  the  nested  interval  theorem  [38, 
problem  2-1-12],  These  previous  properties  would  be  a  consequence  of  any  rule  that 
chooses  feasible,  usable  iterates.  The  value  of  the  rule  in  (3.37)  is  that  it  allows  for 
statements  to  be  made  on  the  sequence  {O^}. 

Theorem  3.36.  If  the  likelihood  p(x;  •)  is  bounded  above,  then  the  sequence 

{ii@«.+.>  _  e<qii(.W)} 

vanishes  as  k  — >  oo. 


This  theorem  does  not  guarantee  that  the  sequence  {6^}  will  converge  or 


even  have  a  limit  point.16  (The  series  -  is  an  example  of  a  sequence  satisfying 

* - j 

3= 1  J 


the  above  theorem  with  no  real- valued  limit  points.)  Although,  if  the  set  Og(k)  is 

16A  point  a  is  a  limit  point  of  a  sequence  {an}  if  for  any  integer  K  and  any  e  >  0,  there  exists 
an  k  >  K  such  that  (a*,  —  a|  <  e. 
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bounded  for  some  k ,  then  the  Bolzano- Weierstrass  theorem  [38,  p.  52,  theorem  2-12] 
implies  the  existence  of  a  limit  point  of  the  sequence. 

Theorem  3.37.  If  @g(i)  is  compact  and  convex,  then  limit  points  of  the  sequence 
{6^}  are  also  stationary  points. 

The  theorem  remains  true  if  Og(k)  is  compact  and  convex  for  any  k. 

Theorem  3.38.  If  ©g(i)  is  compact  for  all  sequences  in  a  closed  set  of  0/  and  if 
there  is  a  unique  limit  point  6*  for  all  such  sequences  then  lim  0lk>  =  6 *  for  every 

k — >oo 

sequence  {6^}.  Also,  6 *  is  the  maximum  of  p(x ;  •). 

3. 4. 3. 2  Linear  constraints 

Linear  constraints  on  the  parameter  typically  restrict  the  parameter  to  a  set 
of  the  form  ©/  =  {O'  G  0  :  FG'  +  v  =  0}  (see  section  3.3).  Under  this  linear 
constraint,  the  restoration  operation  7r  [•]  is  redundant  since  any  step  remains  in  the 
constraint  space,  i.e.,  since  G^  G  0/  and  F  ■  CCRB^U)  —  o.  Thus  the  method  of 
scoring  with  parametric  equality  constraints  in  (3.36)  simplifies  to  the  iteration 

G{k+ 1)  =  0(fc)  +  a(fc)CCRB(0(fc))s(z:;  0(fc)). 

Example  3.39  (Linear  model  with  linear  constraints).  In  the  linear  model  with  nor¬ 
mal  noise  case  of  section  3.3.2,  we  have  x  =  HG+w  with  w  ~  J\f( 0,  C).  In  this  case, 
the  negative  Hessian  is  the  FIM  and  the  optimization  problem  becomes  a  null  space 
quadratic  exercise  [21],  i.e.,  the  minimization  of  a  quadratic  objective  subject  to  a 
linear  constraint.  The  CCRB  =  U  (UT HT C-1  HU)  1  UT,  which  is  constant  with 
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respect  to  the  parameter;  and  the  Fisher  score  is  s(x,G ')  =  HTC  ~ 1  (x- HO'). 
Hence,  if  G1-1'1  is  any  feasible  vector,  e.g.,  0(1)  =  —  FT  ( FF 7 )  1  v ,  then  the  method 
of  scoring  with  constraints  finds  the  CMLE  in  one  step  to  be 

0(2)  =  0(1)  +  CCRB  •  HTC  l  (x  -  HO^) 

which  is  exactly  the  formula  in  (3.29)  from  theorem  3.30.  The  next  iterate  0(:!i  = 
0(2)  +  CCRB  •  HTC  _1  (*  —  HG W)  reveals  that  the  procedure  reaches  a  fixed  point 
since 

CCRB HtC~x  [x  -  HG&) 

=  CCRB JfTC_1  (x-H  (0(h  +  CCRBHTC~1  (x  -  HG^))^j 
=  CCRB HTC  l  (x  -  HG «)  -  CCRBHT  C~k  HCCRBHT  C~l  (x  -  HG W) 

=  0 

Therefore,  0cml(^)  =  0(2). 

Example  3.40  (Jamshidian’s  GP  algorithm).  Jamshidian  [33]  developed  a  Gradient 
Projection  (GP)  algorithm  for  maximizing  the  likelihood  subject  to  linear  parameter 
constraints  using  the  iteration 

0(fc+ 1)  =  G{k)  +  oi{k)  (w ”1  -  W~1FT  (. FW-1FT)~l  FW~ s(x;  G(k))  (3.38) 

for  some  positive  definite  matrix  W.  An  optimal  choice  with  regard  to  the  algo¬ 
rithm’s  rate  of  convergence,  Jamshidian  suggests,  is  a  possibly  diagonally  loaded 
Hessian  of  the  log-likelihood 

W(X,  0 <*>)  =  0W)  +  7  (k)ImXm, 
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where  7®  >  0  is  chosen  to  be  sufficiently  large  enough  to  ensure  the  positive 
definiteness  of  the  matrix  XV.  This  formulation  is  closely  connected  to  the  method 
of  scoring  with  constraints.  The  GP  iteration  is  equivalent  to  scoring  by  choosing 
XV (x,  be  the  FIM,  when  it  is  nonsingular.  Indeed,  the  projecting  matrix 

is  similar  to  the  Marzetta  form  of  the  CCRB  in  [47]  with  XV (x,  6^)  replacing  the 
FIM.  This  fact  produces  a  slight  generalization  of  the  GP  iteration,  given  by 

0(fc+i)  =  0{k)  +  a^U  (UTXV(x,  6^)11)  1  UTs(x ;  0 <*>) 

where  U  is  defined  as  in  (3.6).  In  this  formulation,  the  projecting  metric  XV(x,  0^) 
only  needs  to  be  positive  semidefinite.  Alternatively,  in  (3.38)  the  Aitchison  and 
Silvey  [4]  substitution  for  the  FIM,  1(0 W)  +  FT (O^)KF(O^),  instead  of  a  di¬ 
agonally  loaded  Hessian  of  the  log-likelihood  (or  even  a  diagonally  loaded  Fisher 
information  matrix). 

In  this  sense,  the  two  iterations  are  equivalent  for  the  linear  model,  when  the 
FIM  is  simply  the  negative  Hessian.  This  occurs  when  the  log-likelihood  is  quadratic 
(normal).  This  also  suggests  the  adaption  of  Jamshidian’s  GP  algorithm  to  cases  of 
nonlinear  constraints.  Likewise,  asymptotically,  where  yn  is  denoted  as  in  section 
3.4.2,  then  by  the  law  of  large  numbers  the  Jamshidian  projection  matrix 

W(yn,  0 W)  -  nI(0W)  +  7 {k)Imxm 

for  an  arbitrarily  small  (possibly  zero)  7^  >  0,  as  1(0^)  is  positive  semidefinite. 
The  projection  step  update  then  becomes 

0(fe+ 1)  =  0(fc)  +  a(fc)CCRB(0(fc))s(2/n;  0(fc))  -  a{k)^k)s(yn]  0(fc)), 
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i.e.,  essentially  equivalent  to  the  method  of  scoring  with  parametric  equality  con¬ 
straints  in  (3.35). 


3.5  Hypothesis  testing 

In  section  2.5,  hypothesis  testing  (the  Rao  and  Wald  tests)  using  the  CRB 
was  reviewed.  In  this  section,  hypothesis  testing  is  considered  under  a  constrained 


alternative.  Assume  h 


is  a  consistent  and  nonredundant  differentiable 


function,  which  is  also  consistent  and  nonredundant  with  the  differentiable  function 
f  :  — >  Rk.  Hence,  0^  =  {O'  :  h(0')  =  0}  C  0/,  where  0/  is  defined  as  in  (3.4), 


and  also  rank( 
be  stated  as 


H(G) 

F(0) 


)  —  r  +  k  <  m  and  r  <  m  —  k.  Then  the  hypothesis  test  can 


H0  :  h{0)  =  0  vs.  Hx  :  f(0)  =  0. 


(3.39) 


Naturally,  f(0')  =  0  under  these  conditions  defines  an  implicit  function  locally,  so 


assume  gg> 


pm—k 


is  such  a  function  satisfying  theorem  3.3  for  any  6  G  0/. 


Then,  a  locally  (or  asymptotically)  equivalent  hypothesis  can  be  stated  as 


Ho  :  h(g0(£))  =  0  vs.  Hx  :  h(ge{£))  ^  0. 


(3.40) 


In  this  formulation,  the  well-known  Rao  and  Wald  statistics  were  shown  in  section 
2.5. 
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3.5.1  The  Rao  statistic 


For  the  hypothesis  testing  scenario  in  (3.40),  the  Rao  test  statistic  presented 
in  section  2.5.1  is  given  by 

P(Vn)  =  ST{yn ;  £h{g)  (2/n))!,;1  {£h(g)  {yn))s{yn]  £h(g)  (■ yn )) 

where  s(yn]  £h(g)(yn))  is  the  Fisher  score  of  the  observations  yn  (as  defined  in  sec¬ 
tion  3.4.2)  and  evaluated  at  the  //o-constrained  maximum  likelihood  estimate  or  con¬ 
strained  root  of  the  likelihood  estimate  (CRLE)  £h(g)  (yn)  of  the  likelihood  q(yn ;  $,)  = 
p(yn]  9o(£)),  and  I~l (£h{g){yn))  is  the  n-sample  Fisher  information  evaluated  at  the 
CMLE.  In  this  context,  the  CMLE  is  the  solution  to  the  optimization  problem 

max  log  g (ay  £)  s.t.  hoge'(£)  =  0.  As  in  theorem  3.5,  s(yn;£')  =  Ge>  {£)s{yn\ 6') 
S' 

and  In(£)  —  n  GT )I{ge>  (£))G{£).  Also,  recall  that  for  sufficiently  large  n, 
0h{yn)  =  9e(£,h(g)(yn)) ■  Therefore,  the  locally  (or  asymptotically)  equivalent  Rao 
test  statistic  for  the  hypothesis  in  (3.39)  is 

p{y,i)  =  ^sT(y>P  0h(yn)) CCRB (6h(yn))s(yn;  9h(yn)),  (3.41) 

which  is  analogous  to  (2.9)  with  the  CCRB  replacing  the  CRB.  Under  H0,  p(yn) 
is  still  asymptotically  Xr  in  distribution.  The  corresponding  Lagrange-multiplicr 
variant  of  this  statistic  is  given  by 

p{yn)  =  -Xh(yn)H(9h(yn))CCRB(9h(yn))HT(0h(yn))Xh(yn),  (3.42) 

n 

where  the  Lagrange  multiplier  estimates  Xh(yn)  are  based  on  the  first  order  condi¬ 
tions  relating  to  the  constraint  h  (not  h  and  /). 
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The  result  in  (3.41)  is  consistent  with  the  classical  results  in  [63],  although  not 
explicitly  in  this  form.  For  the  hypothesis  scenario  in  (3.39)  the  Lagrange  multiplier 
statistic  should  be  XjR^]  Xh,  where  RHq  is  defined  by 

~Pe  *  *  1  [  I(0)  +  FT(0)F(0)  - FT(0 )  HT(0)  l'1 

*  *  *  =  F(0)  0  0  , 

*  *  Rh,e  if(0)  0  0 

which  is  a  variant  of  what  appears  in  [63,  equation  (6.5)].  Finding  the  inverse  using 
the  Schur  complement,  it  is  clear  that 

Rh,0  =  HT{0)  D~1(0)  -  D~1(6)Ft(6)  (. F(0)D~1(0)F<r(O))~ 1  F(0)D~l(6)  H(0) 

with  D(0)  =  1(0)  +  FT (0)F(0).  Recognizing  the  inner  matrix  as  the  Aitchison- 
Silvey-Crowder  variant  of  the  CCRB  formula  in  (3.14)  and  substituting  the  CMLE 
for  the  parameter  obtains  (3.42). 

3.5.2  The  Wald  statistic 

Similarly,  the  Wald  test  statistic  presented  in  section  2.5.2  is 

hT(g(i(yn)))  (^(flr(^(2/Ti)))G'r(^(2/^))i:-1(^(2/7,))G(^(2/Ti))^r(flr(^(2/^))))"1^(flr(^(2/^))) 

for  the  testing  problem  in  (3.40),  where  g  is  localized  about  6.  Following  the  steps 
in  section  3.5.1,  then  to  test  (3.39),  the  corresponding  Wald  test  statistic  is 

u(yn)  =  nhT(0(yn ))  (jf(0(2/n))CCRB(0(2/n))Jfr(0(2/n)))_1  h(0(yn)). 

As  with  the  Rao  statistic,  this  general  Wald  statistic  replaces  the  CRB  in  (2.10) 
with  the  CCRB,  and  under  H0,  u(yn)  is  asymptotically  in  distribution. 
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This  agrees  with  the  classical  result  in  [2,  ‘A2i(#)’  on  p.  240]  where  the  Gorman- 
Hero-Aitchison-Silvey  variant  of  the  CCRB  formula  in  (3.12)  is  used  instead.  (The 
general  scenario  when  the  FIM  is  singular  is  discussed  in  [2,  section  3.9].) 

A  requirement  for  the  existence  of  this  statistic  is  that  H(0)CCRB(d)Hr(6) 
be  regular.  This  is  not  an  additional  requirement,  but  a  necessity  in  testing  that 
the  hypothesis  testing  function  itself  be  identifiable  for  the  hypothesis  to  be  valid. 

3.6  Discussion 

The  previous  sections  have  established  that  the  theory  of  the  constrained  CRB 
is  equivalent  to  that  of  the  CRB.  The  majority  of  the  proofs,  besides  their  novelty  in 
the  recent  literature  on  the  CCRB,  essentially  rely  on  the  generation  of  an  implicit 
function  that  translates  the  constrained  parametric  problem  into  an  unconstrained 
parametric  problem,  for  which  the  theory  of  the  CRB  is  well-established. 

This  theory  has  been  extended  to  the  CCRB,  in  particular,  for  identifiabil- 
ity  under  constraints;  for  the  linear  model  with  linear  constraints  (already  well- 
documented  in  the  literature  but  perhaps  not  with  reference  to  this  CCRB);  for  the 
constrained  maximum  likelihood  problem,  its  asymptotic  normality  and  the  method 
of  scoring;  as  well  as  for  the  Wald  and  Rao  hypothesis  tests.  This  list  is  by  no  means 
exhaustive.  There  exists  recent  research,  for  example,  related  to  biased  estimation 
with  the  CCRB  [9],  and  an  extension  of  the  CCRB  for  complex- valued  parameters 
[32],  In  addition,  there  are  other  areas  in  mathematical  statistics  that,  as  far  as  I 
know,  have  not  yet  been  connected  to  the  CCRB,  including  a  geometric  interpreta- 
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tion  of  the  CCRB  (possibly  in  the  manner  of  [60]  or  and  extension  of  [5]),  biasedness 
issues  with  constrained  estimation,  confidence  intervals  or  sets,  and  the  plausibility 
of  a  Bayesian  version  of  the  CCRB. 

While  the  primary  contributions  of  this  thesis  are  the  theoretical  results  and 
their  proofs  existing  in  this  chapter,  this  should  not  discount  the  practical  applica¬ 
tions  of  these  ideas.  From  the  practitioners  viewpoint,  this  research  has  produced  a 
number  of  useful  tools.  For  example,  to  test  local  identifiability,  in  addition  to  the 
classic  result  of  Rothenberg  (theorem  3.23),  theorem  3.24  may  be  used.  Likewise, 
for  strict  identifiability,  theorem  3.29  is  useful.  And  the  method  of  scoring  in  (3.36) 
adds  to  the  list  of  constrained  maximum  likelihood  methods.  These  and  others  will 
be  applied  in  a  communications  context,  in  the  next  chapter. 
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Chapter  4 


Applications  of  the  CCRB  in  Communications  Models 

Communications  models  in  statistical  signal  processing  often  have  the  struc¬ 
ture 

y(n)  =  n{n ,  0n)  +  w(n)  ,  n  =  0, . . . ,  N,  (4.1) 

where  y(n)  are  the  received  observations  of  some  model  fj,{n,  0n)  affected  (additively) 
by  noise  w(n)  over  a  series  of  time  samples  n.  The  parameters  may  or  may  not 
be  dependent  on  the  number  of  time  samples  and  the  noise  may  or  may  not  be 
independent  (or  even  normally  distributed).  This  general  model  encompasses  a 
number  of  areas  of  signal  processing,  e.g.,  communications,  sonar,  radar,  speech, 
imaging,  control,  sensing,  networks,  etc.  In  every  one  of  these  areas,  there  exist 
models  where  parametric  equality  constraints  are  of  interest  to  practitioners  in  the 
field.  The  CCRB  has  proven  useful  as  a  performance  analysis  tool  in  localization  [6] , 
watermarking  security  [15] ,  tomography  [28] ,  source  bearing  and  symbol  estimation 
[59] ,  space-time  block  coding  [42] ,  and  even  a  variant  of  least  squares  [46] . 

In  this  chapter,  we  shall  detail  just  two  rather  general  signal  processing  com¬ 
munications  models:  the  convolutive  channel  and  the  calibrated  array,  each  with 
unknown  signal  and  channel  components.  Because  the  source  and  channel  interact 
multiplicatively  in  the  models,  numerous  variations  of  these  two  basic  models  are 
possible  (and  necessary).  We  shall  formulate  constraints  for  just  a  few  of  these  vari- 
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ations  and  connect  the  results  of  this  chapter  to  the  theory  developed  in  Chapter 
3.  For  the  sake  of  coherence  in  the  presentation,  the  lengthier  proofs  of  theorems  in 
this  chapter  are  relegated  to  appendix  C. 

In  section  4.1,  the  convolutive  mixture  model  with  deterministic  parameters  is 
presented  using  a  variety  of  descriptions.  In  addition,  several  useful  terms  are  defined 
in  section  4. 1.1.2  that  are  useful  to  characterize  conditions  on  the  a  notion  of  near- 
identihability  presented  in  section  4.1.2  and  conditions  on  the  Fisher  information 
derived  in  4.1.3.  The  corresponding  complex- valued  FIM  (CFIM)  is  defined  in 
4. 1.3.2  and  important  properties  are  given  in  4. 1.3.3.  These  properties  are  critical  to 
understanding  the  particular  inherent  ambiguities  in  the  basic  convolutive  mixture 
model  and  determining  the  class  of  constraints  are  necessary  to  obtain  a  CCRB,  as 
discussed  in  4. 1.4.1.  Two  constraint  models,  the  norm  constraint  (in  4. 1.4.2)  and  the 
semiblind  constraint  (in  4. 1.4.3)  serve  as  further  validation  of  the  CCRB  method 
by  obtaining  prior  results  in  the  literature;  while  another  constraint  model,  the 
combination  of  the  semiblind  and  unit  modulus  constraints  in  4. 1.4.4,  demonstrates 
an  important  constraint  model  that  did  not  previously  exist  in  the  literature. 

In  section  4.2,  a  special  case  of  the  convolutive  mixture  model,  called  the  cal¬ 
ibrated  array  model,  is  considered.  The  FIM  for  this  (sub)model  and  its  properties 
are  detailed  in  section  4.2.1  and  various  constraints  are  considered  in  section  4.2.2. 
As  before,  a  number  of  the  constraint  models  are  validations  of  the  CCRB  method 
compared  with  previous  results  in  the  literature  (in  4. 2. 2.1,  4. 2. 2. 2  and  4. 2. 2. 5). 
The  CCRB  approach  is  also  used  to  detail  constraint  models  where  the  constraints 
are  inappropriate  either  because  the  constraints  are  over-determined  (in  4. 2. 2. 3)  or 
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because  they  are  under-determined  (in  4. 2. 2. 4).  As  with  the  more  general  convo- 
lutive  mixture  model,  a  constraint  model  case  is  presented  in  4. 2. 2. 6  that  did  not 
previously  exist  in  the  literature. 

4.1  Convolutive  Mixture  Model 

The  complex  baseband  representation  of  a  multi-input,  multi-output  (MIMO) 
finite  impulse  response  (FIR)  system,  or  the  convolutive  mixture  model,  may  be 
written  as 

K 

Vm  (n)  =  ^2  sW  ( n )  *  ( n )  +  W m,  (n) 

k= 1 

=  f2T,^,(n-l)hg>(l)  +  wm(n)  (4.2) 

k=  1  1=0 

for  the  nth  observation  of  the  mth  channel  where  the  model  consists  of  K  sources, 
M  channels  (1  <  m  <  M )  with  maximal  order  Lk  for  the  kill  source  (or  Lk  +  1  taps 
for  the  kt\i  source),  N  output  samples  (0  <  n  <  N  —  1)  per  channel  and  N  +  Lk 
input  samples  per  channel  per  source.  For  each  source,  the  model  can  be  seen  in 
figure  4.1.  The  dimensions  K,  M,  Lk  for  k  =  1 and  N  are  all  assumed 
known.  The  elements  s^k\n)  denote  the  scalar,  complex  value  of  the  A; t li  input 
source  at  time  n;  the  elements  of  hm\l )  denote  the  scalar,  complex  value  of  the  Zth 
filter  coefficient  (or  lag)  of  the  kth  source  processed  by  the  mth  channel.  Both  the 
signal  inputs  and  channel  coefficients  are  treated  as  deterministic  unknowns,  i.e., 
parameters  having  true  values  that  can  be  estimated.  The  noise  elements  wm{n ) 
are  commonly  assumed  to  be  zero-mean,  complex- valued,  circular  Gaussian  iid  over 


space  and  time  (and  channel)  with  known  variance  a2.1 


1=0 


Figure  4.1:  Finite  Impulse  Response  (tapped  delay  line)  model. 


The  convolution  aspect  of  this  model  is  often  used  to  characterize  the  inter¬ 
symbol  interference  in  direct  sequence  CDMA  (code-division  multiple-access)  over 
dispersive  channels  [44],  This  occurs,  for  example,  when  the  propagation  time  of 
the  signal  is  shorter  than  the  coherence  time  of  the  channel  in  wideband  signals. 
The  varying  channel  order  lengths  represent  the  different  coherence  times  of 
the  different  source-channel  links.  This  convolution  also  applies  to  scenarios  where 
reverberation  of  the  transmitted  signal  is  present,  which  occurs  when  the  commu¬ 
nications  is  echoed.  Additionally,  the  additive  aspect  over  the  sources  represents 
the  multiuser  interference  (e.g.,  the  cocktail  problem)  common  in  communications 
systems  today.  Moreover,  this  convolutive  mixture  model  incorporates  a  number  of 
important  model  subclasses: 

1.  the  convolutive  single-input,  multi-output  (SIMO)  model  when  K  =  1, 

1  Under  the  Gaussian  assumption,  an  unknown  variance  parameter  decouples  from  the  unknown 
mean  parameters  in  the  Fisher  information.  So,  while  the  noise  power  will  affect  the  performance 
potential,  whether  cr2  is  known  or  not  does  not  affect  how  the  CRB  (CCRB)  depends  on  the 
parameters.  It  does,  of  course,  affect  estimation. 
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2.  the  convolutive  single-input,  single-output  (SISO)  model  when  K  =  M  —  1, 

3.  the  memoryless,  instantaneous  mixing  model  when  L/c  =  0  for  all  k  —  1, . . . ,  K, 
and 

4.  the  calibrated  model  (with  constraints  and  a  transformation  of  parameters). 

For  generality,  we  consider  the  full  (MIMO)  model,  but  the  results  contained  in  this 
section  also  apply  to  the  preceding  subclasses. 


4.1.1  Equivalent  Convolutive  Mixture  Models 


4. 1.1.1  The  Vector-Matrix  Model 


In  vector-matrix  notation,  the  model  may  be  written  as 


K 


y 


=  J2HMsik)  +W  =  HMS  + 


w 


(4.3) 


fc=l 


where  the  observations  are  contained  in  y1  =  \yf,  ■  ■  ■  ,  yjf]  G  CNM  and  the  ob¬ 
servations  for  each  channel  contained  in  yjn  =  [ym(0),  •  •  •  ,ym(N  —  1)]  e  Cw  for 
m  =  1, . . . ,  M,  s^T  =  [Vfc)(— Lfc), . . . ,  s(k\N  —  1)]  G  CN+Lk  represents  the  input 
sequence  of  the  kth  source,  the  channel  matrix  is  represented  by  the  NM  x  N  + 


matrix 


H 


(■ k ) 

M 


H 


(k) 

(i) 


(M)_ 


(4.4) 
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where  the  mth  channel  submatrix  over  the  kth  source  is  given  by  the  N  x  N  +  Lk 
matrix 

'h$(Lk)  hm\lk  —  1)  •••  ht\ 0) 

h$(Lk)  hm\Lk  —  1)  •••  h&\ 0) 

h$(Lk)  hm\hk  —  1)  •••  ht\ 0) 

(4.5) 

for  m  =  1, . .  • ,  M,  and  the  noise  vector  is  w7  =  [tuf ,  •  •  •  ,  w]^\  G  <CAIN  with 
the  noise  for  each  channel  given  by  wjn  =  [wTO(0), . . .  ,wm(N  —  1)]  e  CN .  This 
particular  vector-matrix  ordering  of  the  model  can  also  be  represented  as 

K 

V  =  (Imxm  ®  s (fc))  h(k)  +  w  =  SMh  +  w  (4.6) 

k= 1 

where  the  input  samples  are  now  organized  into  an  N  x  Lk  +  1  Toeplitz  matrix 

s(fc)(0)  s(fc)(-l)  ...  sW(-Lfc) 

s(fc)(l)  s(fc)(0)  ...  S(fe)(-Lfc  +  1) 

s(fc)  (tv  -  1)  s(fc)  (AT  -  2)  •••  (N  -  Lk  -  1) 
with  <8)  being  the  Kronecker  product,  and  the  channel  elements  are  vectorized  as 
h^T  =  [h?T: . . . ,  h(ST]  e  CM(£‘+1>  With  h£)T  =  [/^(o), . . . ,  hln\Lk)]  e  CL*+1 
for  each  k  —  1, . . . ,  K  and  m  —  1, . . . ,  M.  The  purpose  of  these  alternative  vector- 
matrix  methods  will  become  evident  in  the  development  of  the  Fisher  information 
(see  section  4.1.3)  for  the  original  model  in  (4.2). 

4. 1.1.2  The  Z  transform  model 

Yet  another  alternative  representation  of  this  model,  necessary  to  introduce 
relevant  characteristics  of  the  model  parameters,  and  often  referred  to  as  being  in 
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the  Z-transform  domain,  is  the  M- variate  stationary  process 

y(n)  =  [H(z)]  *  s(n)  +  w{n)  (4.7) 

where  H(z )  is  a  M  x  K  (global)  transfer  function  (polynomial)  defined  by 

for  nonzero  z  <E  C*  (including  oo)  with  H ^  (z)  being  the  kth  source  transfer  function 

L k 

H^\z)  =  YJh{k\l)z~l 
1=0 

and  with  h^(l)T  =  G  CM  for  l  =  0,...,Lk  and  k  = 

s(n)T  =  [s^^(n),  s(2\n), . . . ,  s^(n)]  G  CK  for  n  =  0, . . . ,  N  —  1,  and 

y(n)  and  w(n)  correspond  to  match  the  models  in  (4.3)  and  (4.6). 

Hence,  H{z)  is  a  polynomial  matrix  of  the  backward  shift  z -1.  The  kt\i 

source  transfer  function  is  said  to  have  a  common  zero  if  there  exists  a  nonzero 

Zo  e  C*  such  that  H^(zq)  =  0.  If  the  transfer  function  does  not  have  a  common 

zero,  then  the  polynomials  H(k\  (z)  =  h{k\l)z  l,  for  m  =  1, . . . ,  M,  are  said 

1=0 

to  be  coprime.  The  global  transfer  function  is  said  to  be  reducible  if  there  exists  a 
nonzero  zq  G  C*  such  that  rank(iT(z0))  <  K.  If  not,  it  is  said  to  be  irreducible. 
The  global  transfer  function  is  said  to  be  column-reduced  if  limrank (H(z)Z)  =  K 

z — >0 

where  Z  =  diag {zLl , . . . ,  zLk}.  This  necessarily  implies  for  an  irreducible  and 
column-reduced  transfer  function,  called  a  minimum  polynomial  basis  [35],  that 
M  >  K.  The  connection  between  irreducibility  in  the  multi-source  case  and  not 
having  common  zeros  in  the  single  source  case  is  made  clear  in  the  following  result. 
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no  common 


K 

Theorem  4.1.  H{z)  is  irreducible  if  and  only  if  akH^k\z)  has 

k= 1 

zeros  for  any  nontrivial  complex-valued  collection  op, ... ,  oik- 

Proof.  H  (z)  is  reducible  if  and  only  if  there  exists  some  point  zq  G  C*  such  that 

rank(iy(zo))  <  K,  which  follows  if  and  only  if  there  exists  nontrivial  /A, . . . ,  (3k  G  C 
k  K 

such  that  /3kH('k\zo)  =  0,  i.e.,  if  and  only  if  j3kH^k\z)  has  a  common  zero. 

k=  1  k=  1 

□ 

The  connection  between  the  model  in  (4.3)  and  (4.7)  is  that  the  K -source,  M- 
channel  matrix  Hm  is  a  column-rotated,  generalized  Sylvester  matrix  of  the  block 
Toeplitz  matrix 

~H(L)  H(L-l)  H(  0) 

H(L)  H(L  —  1)  H(  0) 

H(L)  H(L~  1)  H(  0) 

where  H(l)  =  [h^ (/),...,  with  h^k\l)  a  null  vector  when  l  >  Lk  and 

L  =  maxLfc.  Furthermore,  under  certain  conditions,  the  rank  of  Hm  is  determined 

k 

by  the  characterization  of  the  basis  H(z). 

K 

Theorem  4.2.  Assume  M  >  K  and  N  >  L/,. .  Then  Hm  is  full  column  rank  if 

fc=i 

and  only  if  H(z)  is  a  minimum  polynomial  basis. 

Proof.  This  can  be  shown  by  a  variant  of  the  proof  in  Loubaton  and  Moulines  [44, 
theorem  1],  \~J 

The  input  sequence  s ^  is  said  to  have  pk  modes 2  if  it  can  be  written  as  a 
2There  are  a  number  of  alternative  definition  of  modes  [30,  43],  [35,  p.  168]. 
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linear  combination 


Pk 

sW(n)=  Y,Ci<+L\ 

1=1 

where  ctl  i  =  1 , ...  ,pk  are  complex- valued  weights  and  rnl ,  %  =  1, ...  ,pk  are  the 
C*-valued  roots  of  the  polynomial 

a(0)  +  a{l)z~l  +  a(2)z~2  H - h  a{pk)z~Pk , 

whose  coefficients  satisfy 

Pk 

J2s{k\i+j)a(j)  =  0 
j=o 

for  %  =  —  Lk, . . .  ,N  —  pk  —  1.  Hence,  the  Toeplitz  matrix 

s^(n)  s(fe)(n  —  1)  •••  s(fe)(— Lfe) 

...  s(fe)(n  +  l)  s^Hn)  ■■■  s(fc)(— Lfc  + 1) 

SW(n)=  \  W  1  .  (4.8) 

sW(7V-l)  s(fc)(iv-  2)  •••  S(fc)(iV  — n  — Lfc-1) 

has  rank  min  (TV  —  n,p,n  +  Lk  +  1}  [76,  lemma  1],  Note,  5^(0)  =  S1^)  and 

S^(—Lk)  =  s from  section  4. 1.1.1.  If  one  of  the  modes  of  s®  is  a  common 
zero  of  the  channel  transfer  function  H^k\z),  say  mv ,  then  the  channel  makes  no 
distinction  between  and  deffiied  by 

Pk 

s^\n)  =  YJCimnl+L\ 

i= 1 
i^v 

even  though  Ironically,  if  a  channel  lacks  sufficient  diversity,  the  more 

modes  an  input  has  leads  to  greater  risk  of  lost  information  on  the  input  but  poten¬ 
tially  less  meaningful  loss  depending  on  the  weights. 

Theorem  4.3.  The  matrix 

S(n)=[SW(n)  SW(n)  5^(n)]  (4.9) 
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K  K 

is  full  column  rank  only  if  N  >  (. if  +  l)n  +  K  +  £  Lj,  >  Kn  +  K  +  Y.  A, 

j= 1  fc=l 

K 

and  pk  >  Lk  +  l  +  n  for  each  k  —  1, . . . ,  K  and  if  iV  >  (. K  + 1  )n  +  K  +  (. K  +  2)  Lj , 

i= i 

K  K 

Ptotai  >  K  *  n  +  K  +  {K  +  1)  ^  ~]  Lk,  and  pk  >  Tfc  +  1  +  n  +  ^  ~]  Ty 

fc=i  i=i 

Proof.  This  is  a  variation  of  the  results  in  [76,  1,  48,  49]. 

K  K 

In  particular,  for  n  —  0  then  S(0)  requires  N  >  K Lj,  ptotai  >  ff+^^  Lj, 

3= i  i=i 

and  pk  >  Lk  +  1,  for  each  k  —  1, . . if ,  to  be  full  rank.  Or  conversely,  5(0)  is  full 

A'  A'  A' 

rank  if  IV  >  K  +  (if + 2)  ^  L,- ,  ptotal  >  K  +  (if  + 1)  ^  Lk ,  and  pk  >  Lk  + 1  +  ^  Lj . 

j= 1  fc=i  i=i 

Before  continuing  to  the  development  of  the  Fisher  information  for  this  con- 
volutive  mixture  model,  it  is  relevant  to  consider  a  notion  of  identihability  using  the 
concepts  of  the  Z-transform  model  in  (4.7).  However,  this  next  section  is  a  brief 
aside  that  may  be  skipped  if  not  of  interest  to  the  reader. 


4.1.2  Strict  Identifiability 

The  Jf-user,  M-channel  FIR  system  in  (4.2)  or  (4.3)  is  said  to  be  strictly 
identifiable  (SI)  if  and  only  if 

Hms  =  Hms  -<=>-  H  ( z )  =  H(z)A  and  s  ( n )  =  A_1s(n) 

for  some  nonsingular  matrix  A.  The  converse  statement  is  always  true,  so  strict 
identihability  depends  on  the  conditional  statement.  The  term  strict  identihability 
is  a  misnomer.  As  is  clear  from  the  definition,  the  deterministic  parameters  are 
not  “identifiable”  when  they  are  strictly  identihable.  Instead,  the  parameters  are 
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identifiable  up  to  some  minimal  ambiguity.  When  this  situation  occurs,  then  it 
is  possible  to  treat  the  channel  (signal)  components  statistically  to  truly  identify 
the  deterministic  signal  (channel)  parameters  using  stochastic  approaches,  e.g.,  the 
subspace  method  [24,  44],  In  the  context  of  this  thesis,  where  the  parameters  are 
not  treated  as  random  but  as  deterministic,  the  reduction  to  a  minimal  ambiguity 
set  also  reduces  the  number  of  necessary  constraints  to  eliminate  it. 

Strict  identifiability  in  the  convolutive  channel  model,  it  shall  be  shown,  has 
some  interesting  connections  with  the  Fisher  information  matrix.  This  is  not  sur¬ 
prising,  given  the  results  in  section  3.2.  It  is  perhaps  also  intuitive  to  expect  that  as 
strict  identifiability  is  a  notion  of  near -identifiability,  then  the  corresponding  Fisher 
information  will  satisfy  some  notion  of  near -regularity. 

Abed-Meraim  and  Hua  [1]  developed  necessary  and  sufficient  conditions  for 
strict  identifiability  in  terms  of  characteristics  in  the  Z-transform  model,  i.e. ,  no 
channel  zeros,  the  number  of  signal  modes,  more  sources  than  sensors,  etc.  The 
following  two  theorems  will  be  presented  without  proof,  as  they  were  in  [1],  Proofs 
I  developed  can  be  found  in  [48]. 

Theorem  4.4  (SI  necessary  conditions).  The  M-channel  A'-source  FIR  system  is 
strictly  identifiable  only  if3 

(a)  H(z)  is  irreducible  and  column-reduced, 

K 

(b)  Ptotal  3  A  ~  ^  ^  Lj , 

3= 1 

3In  [1],  condition  (a)  omits  the  column-reducedness  requirement,  condition  (c)  did  not  include 
the  special  case,  and  condition  (d)  is  originally  (and  I  think  erroneously)  stated  N  >  2K +X]j=1  A  ■ 
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(c)  Pk  >  Lk  +  2  for  k  —  1, . . . ,  A'  of  pk  >  1  if  Lk  =  0,  and 

K 

(d)  N>K  +  J2Lr 

3= 1 

Theorem  4.5  (SI  sufficiency  conditions).  The  M-channcl  A'-source  FIR  system  is 
strictly  identifiable  if 

(a)  H(z)  is  irreducible  and  column-reduced, 

K 

(b)  Ptotai  >  K  +  (K  +  1)  A,, 

3= 1 

K 

(c)  Pfc  >  Lk  +  1  +  E  Lj  for  k  —  1, . . . ,  A',  and 

j=i 

K 

(d)  N>K  +  (K  +  2)^1- 

3= 1 

Yet  other  notions  of  near-identihability  in  the  convolutive  channel  model  exist 
in  the  literature,  e.g.,  cross-relation-based  identifiability  [43],  which  have  an  equiv¬ 
alence  to  strict  identifiability  in  the  SIMO  case  [30].  The  necessary  and  sufficient 
conditions  presented  here  are  independent  of  the  number  of  channels,  although  one 
should  expect  that  increasing  the  number  of  channels  would  increase  channel  di¬ 
versity  thereby  weakening  the  requirements  on  the  other  channel  or  source  charac¬ 
teristics.  Hence,  theorems  4.4  and  4.5  are  in  some  sense  conditions  for  the  least 
diversified,  strictly  identifiable  scenario  M  =  K  +  1.  Table  4.1  shows  the  growth 
in  necessary  and  sufficient  conditions  as  the  number  of  users  increases  for  a  fixed 
channel  size  and  for  fixed  channel  orders. 
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Table  4.1:  Necessary  and  sufficient  data  sizes  for  strict  identifiability 


ff  of  sources  K 

1 

2 

3 

4 

5 

necessary  data  size  N 

6 

12 

18 

24 

30 

sufficient  data  size  N 

16 

42 

78 

124 

180 

(M  =  6  channels,  L 

k  = 

5  for  all 

sources] 

4.1.3  The  Fisher  information  of  the  convolutive  mixture  model 
4. 1.3.1  Complex-valued  Fisher  information 


Before  presenting  the  Fisher  information  on  the  model  in  (4.2),  certain  details 
about  Fisher  information  matrices  on  complex- valued  parameters  are  necessary.  The 
structure  of  the  FIM  depends  on  the  ordering  of  the  parameters.  For  complex-valued 
parameters,  the  FIM  parameter  vector  consists  of  the  real  and  imaginary  parts  of 
the  parameters.  If  the  complex-valued  parameters  are  collected  in  the  vector  G  and 
the  (FIM)  real  parameter  vector  is  defined  to  be  G1  =  [Re(i?T),  Im(i9T)]  and  the 
real-valued  parameter  FIM  has  the  structure 


1(d) 


E1(9)  - E2(G ) 
E2(G )  E1(G)  ’ 


(4.10) 


then  the  complex-valued  parameter  Fisher  information  matrix  (CFIM)  may  be  de¬ 
fined  as4 


xw  =  £»/(*;  *»)/"(*;  «)  =  \  (Em  +  j  •  Em) 


where  f(xyG) 


<9  log p(xy&)  5 
dG* 


A  number  of  properties  for  the  real- valued  param¬ 


eter  FIM  can  be  gleaned  from  properties  on  the  CFIM. 

4Tliis  CFIM  is  a  submatrix  of  the  complex-valued  parameter  FIM  developed  by  van  den  Bos 
[70],  which  would  be  the  preferred  FIM  for  use  in  a  performance  metric  or  in  applying  constraints 
[32],  However,  in  this  and  the  next  sections,  the  CFIM  presented  here  is  only  used  to  obtain 
properties  relevant  to  the  real- valued  parameter  FIM. 

5 The  complex  derivative  is  defined  to  be  ^  —  j  •  in  this  thesis. 
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Theorem  4.6.  The  null  space  of  the  FIM  of  the  form  (4.10)  has  dimension  exactly 
twice  the  dimension  of  the  null  space  of  the  corresponding  CFIM,  i.e., 


nullity(I(0))  =  2  •  nullity(Z(0)). 


Proof.  Note 


'  <N 

Kl 

a 

[E2  Ei 

b 

=  0  if  and  only  if  E\a  —  E2b  =  0  and  E2CI  +  E\b  =  0 


if  and  only  if  (E\  +  jE2)  ( a  +  jb )  =  (fE\a  —  E2b)  +  j  [E2a  +  E\b)  =  0.  Also, 


'  <M 

1 

a 

\E2  El 

L  1-  _  -1 

b 

of 


a  —6 
b  a 


=  0  if  and  only  if 


fcq 

1 

fcq 

to 

—b 

'fcq 

to 

fcq 

a 

=  0.  Finally,  the  dimension 


is  2  unless  a  =  b  =  0;  however,  the  dimension  of  a  +  jb,  —b  +  ja  is 


only  1  since  (—6  +  ja)  =  —  j  ■  (a  +  jb) . 


□ 


A  version  of  Bang’s  formula  [36,  a  variant  of  (15.52)  on  p.  525]  can  be  devel¬ 
oped  for  complex- valued  parameters,  i.e., 


Tijfd)  =  tr 


d'd* 


dm 


am 


D'd; 


dm 


d'd: 


for  any  observations  y  r^j  CM{y{d),C{d)). 


4. 1.3.2  CFIM  for  the  convolutive  mixture  model 


For  the  convolutive  mixture  model  in  (4.3)  and  (4.6),  only  the  mean  vector 
depends  on  the  unknown  parameters,  hence 

Z(tf)  = 


dm 


ddT 


dm 


ddT 


The  mean  vector  is 

ii{d)  =  y  HMs(k)  =  Y  (J^  ®  s(k) ) h{k) • 

k=  1  k=  1 
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Therefore,  if  the  complex-valued  parameter  vector  in  the  model  in  (4.3)  is  defined 


by 


~hW 

s^T 

h(R)T 

S(K)T 


then  the  complex-valued  FIM  of  the  model  can  be  shown  to  be 


xm 


QiQi 

Q1Q2 

QiQ 

Q2Q1 

Q2Q2  ■■ 

Q?Q 

QkQi 

QkQ2  ’  ’ 

■  QiQ 

where  Q  =  [Qi,  •  •  • ,  Qk]  and  Qk 


Im®S <*>  Jfff 


(4.11) 


(4.12) 


4. 1.3.3  Properties  of  the  CFIM 

In  this  section,  I  develop  properties  of  this  CFIM.  In  particular,  the  singularity 
of  the  CFIM  is  proven  and  a  limit  on  the  dimension  of  this  singularity  is  detailed,  as 
well  as  necessary  and  sufficient  conditions  on  the  parameter  characteristics  to  attain 

this  limit. 

Given  the  inherent  relationship  between  regularity  of  the  FIM  and  identifi- 
ability  as  detailed  in  section  3.2,  it  should  not  be  surprising  that  the  FIM,  and 
hence  CFIM,  is  singular.  The  model  presented  in  (4.3)  or  (4.6)  has  a  multiplicative 
ambiguity  with  any  source  interacting  with  its  corresponding  channel,  i.e. ,  for  any 
nonsingular  matrix  A  G  ~§lLk+lxLk+1 ,  the  input  source  matrix  and  the  channel 
vector  h\n  are  indistinguishable  from  S^A  and  A~ 1  hln  ,  respectively.  Over  all  the 
channels,  this  presumes  a  complex- valued  multiplicative  ambiguity  of  at  least  MLk . 
Additionally,  cross-source  ambiguities  exist  between  a  source  input  and  a  different 
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source  channel.  The  limit  for  the  minimal  degrees  of  freedom  or  the  rank  of  the 
ambiguity  is  given  in  the  following  theorem. 

Theorem  4.7.  The  CFIM  is  singular  and  the  dimension  of  its  null  space  is  lower 
bounded  as 

K  K 

nullity  (X(i?))  >  ^  J2(Li  ~  Lj  +  1)+,  (4-13) 

i= 1  j= 1 

where  (a)+  =  a  for  a  >  0  and  (a)+  =  0  for  a  <  0.  This  limit  quantity  is  the  nullity 
lower  bound  (NLB). 

The  proof  can  be  found  in  the  appendix  C.  The  proof  constructs  the  following 
matrix,  whose  linearly  independent  columns  are  a  basis  for  the  null  space  of  X(t?), 


2) 

^(1) 

0 

0 

'uo 

0 

0 

0 

0 

-sm 

0 

0 

o(2) 

0 

0 

0 

0 

_<s(l) 

0 

0 

h.w 

^(2) 

0 

~,(K) 

^(2) 

0 

0 

0 

a r  = 

0 

o(l) 

5(2) 

-«<2> 

0 

0 

0 

0 

S(2) 

0 

0 

0 

0 

0 

0 

0 

•  . . 

'hP') 

n(K) 

n(K) 

0 

0 

0 

0 

o(l) 

S(K) 

o(2) 

S(K)  ■  ■ 

.  _am  .. 

0 

0 

(4.14) 

where  and  are  defined  in  (C.2)  and  (C.4),  respectively. 

The  nullity  lower  bound  (NLB)  seems  to  be  an  unusual  quantity.  The  term 
(Li  —  Lj  +  1)+  represents  the  ambiguity  for  the  “tap  window”  of  the  ith  channel 
masking  the  interaction  of  the  jth  channel  with  its  corresponding  source.  This 
quantity  also  reveals  that  an  ambiguity  exists  only  if  the  coherence-propagation 
delay  of  the  ith  channel  is  at  least  as  greats  as  for  the  jth  channel.  Intuitively, 
the  greater  the  spread  between  the  channel  orders  increases  the  overlapping  window 
for  the  jth  channel’s  interaction  to  be  masked  and  thereby  increases  the  ambiguity. 
Conversely,  when  the  channel  orders  have  narrow  differences,  this  increases  the 
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diversity  of  the  channel  by  limiting  the  window  where  one  channel  can  cover  that  of 
another. 

A  simple  corollary  follows  stating  that  the  NLB  or  the  dimension  of  the  am¬ 
biguity  set  is  at  least  the  square  of  the  number  of  sources. 

Corollary  4.8.  nullity(X(i9))  >  K2. 


Proof.  Note  (L,  —  Lj  +  1)+  +  (. Lj  —  L,  +  1)+  >  2  with  equality  if  and  only  if 


Li  —  Lj  |  <  1.  Thus, 

K  K 

y!  _  Lj + 1)+ 

*= 1  3= 1 


A'  K  K 

—  -  Lj + 1)+ + — Li + 1)+ 

i=l  j=l  i=l 

jAl 
A'  A' 

—  ^  (-^»  —  A?  +  1)+  +  (Lj  —  Li  +  1)+  +  K 

i=l  j=i+ 1 


> 


K(K  -  1) 
2 


•  2  +  K 


K2. 


□ 


The  proof  also  reveals  that  this  K 2  degree  of  uncertainty  can  only  be  attained 
if  any  two  orders  differ  by  at  most  1  tap,  which  agrees  with  the  intuition  that  channel 
diversity  is  enhanced  when  the  channel  orders  are  not  widely  spread. 

In  addition  to  the  ambiguity  due  to  the  channel  order  spread,  the  degrees  of 
freedom  in  the  model  depends  on  the  number  of  parameters  and  the  number  of 
observations. 

K 

Theorem  4.9.  nullity(X(i?))  >  ^^(1V  +  Lk  +  M(Lk  +  1))  —  MN. 

k= i 

Proof.  Since  X($)  =  -K Q11  Q  then  rank(X(t9))  =  rank(Q).  Since  Q  is  a  MN  x 

K  (  K 

y^(-A+Z/fc+M(Z/fc+ 1)),  then  rank(X(i?))  <  min  <  MN,  N  +  Lk  +  M(Lk  +  1)) 
k=i  l  k=i 
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and  therefore 


nullity(X(t?))  >  colsize(X($))  —  rank(X($)) 

>  max  |  0  ,  +  Lk  +  M(Lk  +  1))  —  MN 

l  fc=i 

□ 

The  number  of  columns  of  Q  (or  X($))  corresponds  to  the  number  of  unknown 
parameters.  Likewise,  the  number  of  rows  of  Q  corresponds  to  the  number  of  equa¬ 
tions  (observations)  in  the  model  (4.2).  In  most  scenarios,  having  more  equations 
(rows)  than  unknowns  (columns)  is  a  necessary  requirement  for  the  unknowns  to  be 
solvable.  Thus,  the  degrees  of  freedom  of  the  ambiguity  space  is  at  least  the  number 
of  unknowns  (parameters)  less  the  number  of  equations,  ffowever,  if  the  equations 
are  linearly  dependent  (redundant),  as  is  the  case  in  the  convolutive  mixture  model, 
then  the  degrees  of  freedom  is  potentially  greater.  The  following  corollary  combines 
theorems  4.7  and  4.9. 

Corollary  4.10. 

{K  K  I< 

y ^y(Lj  —  Lj  + 1)+,  yy  (tv + Lk + M(Lk  + 1)) — mn 

i= 1  3= 1  fc=l 

The  NLB  depends  only  on  the  channel  orders  and  number  of  sources,  whereas 
the  “equations  vs.  unknowns”  nullity  bound  depends  on  the  channel  orders,  the 
sample  size  per  source,  the  number  of  subchannels  per  source,  and  the  number  of 
sources.  Hence,  it  is  of  interest  to  determine  under  what  conditions  on  the  sample 
size  N  and  the  number  of  channels  M  per  source  will  this  second  bound  be  of 
no  consequence.  Control  over  the  dimensions  K  (the  number  of  transmitters),  M 
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(the  number  of  receivers),  N  (the  number  of  time  snapshots  or  transmission  length) 
is  possible  in  the  design  of  many  communications  models.  The  next  two  results 
determine  conditions  on  the  dimensions  of  the  model  which  allow  the  NLB  to  be  the 
minimum  possible  degrees  of  freedom.  The  first  condition  requires  more  subchannels 
per  source  than  sources  (or  more  receivers  than  transmitters  in  a  communications 
context). 

Theorem  4.11.  The  CFIM  X(i?)  can  attain  the  CFIM  nullity  lower  bound  only  if 
M  >  K. 

K 

Proof.  If  M  <  K  then  Hm,  which  is  a  MN  x  KN  +  Lk  matrix,  cannot  be 

fc= 1 

full  column  rank  by  theorem  4.2.  This  implies  the  existence  of  a  null  space  of  a 
larger  dimension  than  the  space  spanned  by  the  columns  of  A f  in  (4.14).  Hence 

K  K 

nullity  (X(i?))  >  ^  ^(L*  -  Lj  +  1)+.  □ 

*= 1  3= 1 

Under  the  assumption  that  more  receivers  than  transmitters  are  in  the  model, 
then  corollary  4.10  implies  a  minimal  requirement  on  the  transmission  snapshots. 


Theorem  4.12.  Provided  M  >  K,  the  CFIM  X($)  can  attain  the  CFIM  nullity 
lower  bound  only  if 

N  £  4^  +  £  L> +K  +  ( A'2  -  £  £(tj  -  L>  +  (415) 


3=1  3=1 

K  K 


M  -  K 

K 


i=l  j= 1 


Proof.  We  desire  ^  -  Lj  +  1)+  >  ^{N  +  Lk  +  M{Lk  +  1))  -  MN.  Solving 

i= 1  j= 1  k= 1 

for  N  shows  the  result.  □ 


As  the  channel  diversity  (in  terms  of  the  number  of  channels  per  source)  in¬ 
creases,  the  necessary  size  on  the  data  to  attain  the  CFIM  NLB  decreases.  As 
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K 

M  — >  oo  then  the  bound  on  the  sample  size  becomes  simply  N  >  K  +  Lj,  which 

i= i 

is  comparable  to  theorem  4.4(d).  However,  for  any  finite  M  and  nontrivial  channel 

K  K  K 

orders,  the  bound  is  effectively  N  >  K  +  Lj  +  1  since  —  EE^-  Lj  +  1)+  > 

j= 1  »=i  i=1 

K 

-K  E<*  j  +  1),  and  hence  a  looser  bound  than  (4.15)  is 
i=i 


IV  > 


'wV  X>  +  E  ^  f /f2  -  +  !) 


M  —  K 


j= i 


l=i 


l=i 


A' 


A' 


>  ^  Lj  +  A"  + 

1=1 


M  -  K 


5>- 


1=1 


In  the  scenario  requiring  the  most  data  samples,  M  —  K  +  1,  then  the  system 

A'  A'  /  K  K  \ 

requires  >  (K  +  1)  S  Ai  +  K  +  *T2-XIZ}(Li-LJ-  +  l)+ j.  (This 

l=i  l=i  V  ?=i  l=i  / 

last  term  is  always  nonpositive.)  In  the  SIMO  scenario  (. K  =  1),  this  necessary 

condition  becomes  IV  >  3L  +  1  which  agrees  with  the  requirement  in  [31]. 6 

Given  that  M  >  K  and  N  satisfies  (4.15),  then  the  minimum  possible  nullity 

of  X(i?)  is  the  nullity  lower  bound, 

K  K 

55  5>*  —  +  1)+j 

i= 1  1=1 


i.e. ,  it  is  possible  to  limit  the  ambiguity  strict  to  the  mixing  of  sources  in  the  convo- 
lutive  channel.  If  it  is  possible  to  understand  the  conditions  under  which  the  CFIM 


attains  this  nullity  lower  bound,  then  it  is  known  that  the  null  space  is  completely 

characterized  by  the  columns  of  A f  in  (4.14).  The  importance  of  this  is  that  when 

6For  the  SIMO  (. K  =  1)  case,  this  nullity  lower  bound  is  exactly  1  and  parameters  for  which 
the  CFIM  attains  this  bound  are  said  to  be  Fisher  information  identifiable  in  [30].  The  notion 
of  identifiability  for  the  SIMO  case  is  sensible  since  by  scaling  a  single  parameter  it  is  possible  to 
obtain  a  bound  on  all  the  remaining  parameters  relative  to  the  scaled  parameter.  This  naturally 
connects  with  the  notion  of  strict  identifiability  in  section  4.1.2.  This  interpretation  does  not 
extend  simply  to  the  MIMO  (K  >  1)  case;  however,  it  shall  be  shown  that  there  does  exist  an 
inherent  connection  between  a  CFIM  attaining  the  NLB  and  the  notion  of  strict  identifiability. 
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the  null  space  can  be  parameterized,  then  it  is  possible  to  use  theorems  3.23  and 
3.24  to  determine  constraints  that  lead  to  regularity  of  the  CFIM  (and  FIM).  Us¬ 
ing  the  concepts  of  signal  excitation  (modes)  and  channel  diversity  (irreducibility 
and  colmnn-reducedness)  as  defined  in  section  4. 1.1.2,  the  following  theorems  also 
establish  a  correlation  of  conditions  between  the  idea  of  near-regularity  when  the 
FIM  attains  the  NLB  and  of  near-identihability  (or  strict  identifiability)  of  section 
4.1.2. 

Theorem  4.13  (CFIM  NLB  necessary  conditions).  The  M-channel  K -source  FIR 
system  Fisher  information  matrix  has  a  nullity  of  exactly  the  NLB  in  (4.13)  only  if 

(a)  H(z)  is  irreducible  and  column-reduced, 

K 

(b)  Ptotal  "  b  y  ^  Lj , 

3= 1 

(c)  pk  >  Lk  +  2  for  k  =  1, . . . ,  K  or  pk  >  1  if  Lk  =  0, 

K 

(d)  N  >  K  +  J^Lj,  and 

3= 1 

(e)  M  >  K. 

Theorem  4.14  (CFIM  NLB  sufficiency  conditions).  The  M-channel  K -source  FIR 
system  FIM  has  a  nullity  of  exactly  the  NLB  in  (4.13)  if 

(a)  H(z)  is  irreducible  and  column-reduced, 

K 

(b)  Ptotal  >  K  +  (Ji  +  1) ^  Lj, 

3= 1 

K 

(c)  pk  >  Lk  +  1  +  Lj  for  k  =  1, . . . ,  K, 

3= i 
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Table  4.2:  Necessary  and  sufficient  data  sizes  for  the  FIM  to  attain  the  NLB 


#  of  sources  K 

1 

2 

3 

4 

5 

necessary  data  size  N 

8 

20 

38 

74 

180 

sufficient  data  size  N 

16 

42 

78 

124 

180 

(M  =  6  channels,  L 

k  ~ 

5  for  all 

sources 

K 

(d)  N  >  K  +  (K  +  2)  Lv  and 

3=1 

(e)  M  >  K. 

With  the  exception  of  condition  (e),  then  theorems  4.13  and  4.14  are  identical 
to  theorems  4.4  and  4.5,  respectively.  This  is  an  expected  result  since  the  Fisher 
information  for  identifiable  parametric  components  within  the  model  should  meet 
certain  regularity  conditions  (see  theorem  2.5).  For  theorem  4.13(d),  the  condition 
is  weaker  than  the  condition  in  theorem  4.12,  but  it  is  left  weaker  to  agree  with  the 
necessary  condition  for  strict  identihability  in  theorem  4.4(d).  (It  is  unclear  if  that 
condition  would  change  if  dependence  on  channel  size  M  is  considered.)  Table  4. 1.3.3 
show  the  growth  in  the  necessary  and  sufficient  data  size  N  as  the  number  of  sources 
increases  for  fixed  channel  size  and  channel  orders,  using  the  stronger  condition  in 
theorem  4.12  instead  of  that  in  theorem  4.13(d).  It  is  clear  that  in  comparison, 
as  the  number  of  sources  increases  the  number  of  channels,  the  necessary  data  size 
grows  to  match  the  sufficient  data  size,  which  lead  to  theorem  4.15  and  corollary 
4.16. 

First,  the  special  case  when  the  CFIM  attains  the  minimal  ambiguity  is  con¬ 
sidered.  As  stated  earlier  this  occurs  when  the  windows  of  the  channels  are,  in  some 
sense,  minimally  spread. 
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Theorem  4.15.  Given  sufficient  channel  diversity  and  modes  in  the  signals,  e.g., 
the  sufficient  conditions  of  theorem  4.14,  then  riullity(X(d))  =  K 2  if  and  only  if 
Lj  G  { L0 ,  L0  +  1}  for  all  j  —  1, . . . ,  K,  for  some  integer  L0. 

Proof.  Under  the  conditions  of  theorem  4.14,  then  nullity(X(d))  =  NLB  and  from 
the  proof  of  corollary  4.8,  NLB  =  K 2  if  and  only  if  ( L*  —  Lj  + 1)+  +  ( Lj  —  Li  + 1)+  =  2 
for  each  i,j  —  1, . . . ,  K .  Either  (L~Lj-\- 1)+  =  (Lj—Li+ 1)+  =  1  or  (L— Lj+1)+  =  2 
and  ( Lj  —  Li  +  1)+  =  0  or  vice  versa.  [~ j 

The  next  corollary  details  the  necessary  and  sufficient  data  size  for  the  sources 
under  the  special  case  of  equivalent  channel  orders. 

Corollary  4.16.  Given  sufficient  signal  modes  and  channel  diversity,  e.g.,  as  in 
theorem  4.14,  if  Lk  =  L  for  each  k  and  M  =  K  + 1,  then  the  CFIM  attains  the  NLB 
if  and  only  if 

N  >  (K  +  2 )KL  +  K. 

4.1.4  Constraints  for  the  convolutive  mixture  model 

In  the  previous  section,  necessary  and  sufficient  conditions  on  characteristics 
of  the  signal  source  and  channel  properties  were  detailed  for  the  CFIM  to  have  a 

K  K 

null  space  with  minimal  dimensions  equaling  the  nullity  lower  bound  — 

i= i  j= i 

Lj  +  1)+.  In  this  section,  pathways  to  regularity  in  the  CFIM  (and  hence  the  FIM) 
as  well  as  several  typical  constraint  sets  are  considered. 
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4. 1.4.1  Pathways  to  regularity 


Requiring  constraints  to  attain  regularity  in  the  Fisher  information  is  unnec¬ 
essary  to  use  this  CCRB  theory  (theorem  3.17),  but  often  without  constraints  the 
identifiable  parameters  are  not  in  the  desired  framework  to  be  of  use  to  the  practi¬ 
tioner.  In  the  communications  context,  constraints  on  a  model  assumed  a  priori  are 
simply  a  common  method  used  to  maintain  a  particular  parametric  structure  in  the 
model. 

Guided  by  theorems  3.23,  3.24,  3.25,  3.27,  and  3.29,  then  an  objective  of 
communications  system  design  is  to  discover  constraints  for  which  identihability  of 
the  parameter,  as  they  are  defined,  is  achieved,  i.e.,  what  properties  can  be  imposed 
on  the  signal  or  channel  to  guarantee  parametric  identihability.  For  the  convolutive 
mixture  model  in  (4.2),  this  involves  an  examination  of  the  Fisher  information  in 
(4.10)  with  the  CFIM  in  (4.12).  In  some  sense,  this  has  already  been  done.  In 
section  4. 1.3.3,  conditions  under  which  the  CFIM  has  a  null  space  spanned  by  the 
columns  of  the  matrix  A f,  defined  in  (4.14),  are  derived. 

If  1(0)  is  a  Fisher  information  with  Cholesky  factorization  L(0)Bl  (6)  where 
V(0)  is  an  orthogonal  complement  to  Lr(6 )  [20,  p.  194] .  Then  theorem  3.23  implies 
for  any  constraint  to  achieve  local  identihability,  its  Jacobian  must  satisfy  F(0)  = 
ALr(6 )  +  BV1  (6)  where  BV1  has  full  row  rank.  Similarly,  theorem  3.24  implies 
the  constraints  must  have  an  orthonormal  complement  U(0)  satisfying  U(0)  = 
L(0)C  +  V(0)D  where  L(0)C  if  full  column  rank. 
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4. 1.4.2  Norm  channel  +  real- valued  source  constraint 

The  norm  channel  constraint 

||/i(1)||2  =  l  (4.16) 

is  a  common  scaling  “trick”  used  in  SIMO  (K  =  1  source)  models  to  obtain  chan¬ 
nel  estimates  under  second-order  statistics  assumptions.  Combined  with  the  (rota¬ 
tional)  constraint  of  restricting  the  source  elements  to  be  real- valued,  i.e., 

Im(s^^(n))  =  0  (4.17) 


for  n  =  —Li,  —L\  +  1, . . . ,  N  —  1,  it  is  clear  that  the  multiplicative  ambiguity  which 

rp 

is  the  basis  of  a r  is  eliminated.  For  the  parameter  vector  6  =  [Re($T),  Im(t9T)] 
with  $  defined  in  (4.11),  the  constraints  are  essentially  separated  as 


F(0)  = 


-^R ,e(h)  0  0 

0  -^Re(s)  0  Flm(s) 


where  h  =  and  s  =  An  orthonormal  complement,  or  a  basis  for  the  null 


space,  of  [FKe(s)  ,  FIm(s)]  is  simply 


I(N+Li)x(N+Li) 

0 


,  but  an  analytic  formula  for 


the  null  space  relating  to  the  channel  components  is  not  so  simple  and  requires 


numerical  programming  for  arbitrary  sizes  L\  and  N. 


Example  4.17.  As  a  particular  example  of  this  constraint,  consider  the  M  =  2 
SIMO  convolutive  channel  model  with  L  =  3  taps,  where  the  channel  is  predefined 
by 


'hfr 

+1,T 

0.3079  +  j0.0698  0.1657  +  j0.2304  0.0198  -  j0.3823  0.0929  -  j0.1853 

-0.1841  +  40.3294  0.4484-40.1689  0.0156  +  40.1526  0.4750-40.0952 
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under  a  white  noise  assumption  (with  known  variance  cr2)  and  the  constraints  in 

(4.16)  and  (4.17).  The  54  (N  =  50)  signal  elements  are  BPSK  (±1)  symbols 

randomly  generated  with  equal  probability  (i.e. ,  Bernoulli(l))  and  fixed  for  the 

simulation.  The  subspace  method  [52,  68]  is  used  with  a  smoothing  factor  of  8. 

Additionally  scoring  with  constraints  (CSA)  using  (3.36)  is  applied  on  the  subspace 

method’s  estimate  h for  possible  improvement.  The  results  are  shown  in  figure 

4.2,  where  the  mean-square  error  (MSE)  per  real  channel  coefficient  is  evaluated  by 
R  2 

ITT?  —  over  R  —  100  runs  or  trials  and  the  SNR  is  given  by  10  log10  -Jj. 


r= 1 

This  example  is  also  in  [68]  and  shows  how  the  subspace  method,  which  had  no  prior 
theory-based  performance  metric,  tracks  with  the  CCRB.  The  additional  overlaying 
estimation  using  the  method  of  scoring  demonstrates  the  focal  maximum  likelihood 
properties  in  the  subspace  method. 


Figure  4.2:  Norm-constrained  channel  estimation  performance. 


Ill 


4. 1.4.3  Semiblind  constraints:  s^k\t)  =  p(t )  for  i  £  T 


For  multiple  sources,  designing  communications  with  channel  constraints,  such 
as  in  the  previous  section,  is  less  tenable  since  it  is  not  always  possible  to  estab¬ 
lish  guarantees  on  functions  of  the  channel  elements.  The  sources,  however,  are 
often  entirely  designable  elements  by  restricting  the  class  of  signals  which  are  to  be 
passed  through  the  channels.  The  class  only  needs  to  be  defined  by  some  functional 
constraint. 

Perhaps,  the  simplest  constraint  is  knowledge  of  a  parameter  element,  i.e., 
Qi  =  a.  This  constraint  produces  a  row  vector  ef  in  the  Jacobian  F(0),  where  e*  is 
the  unit  vector  with  unity  in  the  ith  position  and  zero  values  in  the  other  positions. 
Therefore,  the  corresponding  orthonormal  complement  U(6)  for  this  unit  vector 
eliminates  or  nulls  out  the  i  row  and  column  of  the  Fisher  information  1(0)  while 
preserving  the  other  elements.  Any  other  nonredundant  constraint  in  f  does  not 
change  this. 

Theorem  4.18.  Assume  the  conditions  of  theorem  4.14,  then  nullity(X(i?))  = 

K  K 

££^-  Lj  +  1)+.  Let  X*(i?)  be  denoted  by  X(t?)  with  the  rows  {ip  :  p  = 
i=  i  j= i 

K  K 

—  Lj  +  1)+}  and  the  corresponding  columns  removed.  Then  the 

*— i  j= i 

nullity(X*(i?))  =  0  if  and  only  if  A /"*  is  full  column  rank,  where  A f*  is  the  matrix 
formed  by  taking  the  rows  {ip]  of  Af  in  (4.14). 

Proof.  Let  F  be  a  matrix  that  when  multiplied  by  the  matrix  A f  selects  the  rows 
{ip],  i.e.,  F  is  the  matrix  consisting  of  the  row  vectors  {ef  :  i  G  {iP]]-  Then 
FAf  =  Af „  a  square  full  rank  matrix.  Therefore  F  =  AI(0)  fbaT  for  some 
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A  and  some  B  where  BAfT  is  full  row  rank.  As  discussed  in  section  4. 1.4.1,  this 
identihes  the  parameters  and  hence  nullity(X*(i?))  =  0  by  theorem  3.24.  I  j 

This  theorem  can  be  used  to  specify  either  channel  or  source  parameters  to 
achieve  identihability  and  FIM  regularity.  For  example,  if  only  source  signal  param- 

K 

eters  are  specified,  then  it  is  necessary  and  sufficient  to  specify  ^^(L*  —  Lj  +  1)+ 

3= 1 

parameters  of  for  each  i  =  1, . . . ,  K,  under  the  conditions  of  theorem  4.18. 


4. 1.4.4  Unit  Modulus  constraint  +  Semiblind  constraint 

The  unit  or  constant  modulus  constraint  is  a  particularly  powerful  and  rea¬ 
sonable  assumption.  All  the  source  elements  are  assumed  to  have  unit  modulus, 
i.e., 

|s(fc)(n)|2  =  1,  (4.18) 

for  every  n  =  —Lk, . . . ,  N  —  1  and  for  each  k  —  1, . . . ,  K.  This  constraint  is  useful  for 
modeling  P-ary  phase-shift  keying  (PSK)  in  communications  models  with  unknown 
P,  where  the  signals  are  assumed  to  be  derived  from  a  finite  constellation  of  P 
equispaced  points  on  the  unit  circle.  While  in  practice,  this  assumption  is  often 

K 

viewed  as  a  single  constraint,  it  is  ultimately  N  +  Li)  constraints.  Nevertheless, 

i= 1 

despite  this,  it  is  insufficient  to  identify  the  parameters  by  itself.  For  the  parameter 

vector  in  (4.11),  the  constraint  has  a  gradient 

0  2Re(Si1))  0  0  ■■■0  2Im(^1))  0  0 

F(0)  =  0  0  0  2Re(S'i2))  •  •  •  0  0  0  2Im(Sj2)) 
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which  has  an  orthonormal  complement 


U{6)  = 


0 

-Im(Sf) 

0 

0 


0 

RelS^) 

0 

0 


'M(L  i+l) 

0 

0 

0 


0 

0 

0 

0 


0 

0 

0 

0 

Im(l  1+1) 

0 

0 

0 


0 

0 

0 

-Im(S 


0 

0 

0 


(2)> 


R  e(Si 


(2)\ 


0 

0 

Im(l2+ i) 

0 

0 

0 

0 

0 


0 

0 

0 

0 

0 

0 

Im(l2+ 1) 

0 


satisfying  (3.6)  and  where  =  diag(s^).  Using  this  complement  with  (4.10)  and 
(4.12),  we  have  that  the  i,  j  subblock  of  U1  ( 0)I(0)U(0 )  has  the  structure  C(i,j)  = 


K(3[*d  Mm  bd  \ 

-Im[(/m  0  S^H)H$sf) 

n„r  tt(J)  0(1)1 

nB  \ 


S & )]  R e[S$)HH$H(Im  0  S & )]' 


Re[Jm  0  S®HSW] 
Im [H$H(Im  0  S «))] 


Jd  M 

Im [Im  0  S^HS^} 
R e[H$H(Im  0  S «))] 


Lack  of  identifiability  is  noted  since 

C(k,k)  ■ 


1 N+Lk 

Im  (hW* 
R  e(/iW* 


=  0 


for  each  k  =  1, . . . ,  K.  Careful  examination  determines  these  are  the  only  vectors 
in  the  null  space  of  U 1  ( 0I(9)U(0 ),  resulting  in  the  following  theorem. 


sources 


K  K 

Theorem  4.19.  Assume  nullity(J(0))  =  2£5>  -  Lj  +  1)+.  If  all 

i=  1  j= 1 

are  assumed  to  be  unit  modulus  and  one  complex-valued  parameter  for  each  source 
is  assumed  known,  then  UT (0)1(6)17(0)  will  be  regular  (and  the  model  locally 
identifiable). 


Intuitively,  the  unit  modulus  constraint  eliminates  the  convolutive  mixture 
and  intra-source  multiplicative  amplitude  ambiguity  leaving  only  an  intra-source 
multiplicative  phase  ambiguity. 
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Example  4.20.  Consider  a  K  =  2  source- M  =  2  sensor  linear  instantaneous  mixing 
model  (Lfc  =  0  for  all  sources),  with  N  —  30  samples  per  source,  under  a  R  =  3-ray 
multipath  subchannels,  i.e.,  the  spatial  signature  of  the  kth  source  is  expressed  as  a 

R 

weighted  sum  of  steering  vectors,  i.e.,  h),  =  /3kra(^kr)  where  /3kr  and  0fcr  are  the 

r= 1 

complex-valued  amplitude  and  the  real-valued  AOA  of  the  rth  multipath  of  the  fcth 
source  (see  Figure  4.3).  The  AOAs  and  corresponding  amplitudes  are  {0— 1, 0, 0+4} 


Figure  4.3:  Example  of  multipath  channel. 

and  (\/(h2 Z(— |),  0R5,  v/0.15Z(— ^)}  for  source  and  {0  + A0  — 5,0  + A0,0  + 

A0  +  6}  and  (v/0.15Z(— ^),  -0043,  v/0.25Z(|)}  for  source  s^2\  (See  section  4.2  for  a 

greater  description  on  steering  vectors.)  The  source  elements  come  from  an  8PSK 

alphabet  with  signal  powers  SNR(s^0  =  20dB  and  SNR(s^20  =  15dB.  The  channel 

II  h,(^)  ||  ^ 

elements  are  normalized  so  that  SNR(s^)  =  1 M J}  with  a2  =  1.  The  constraints 
assumed  are  unit  modulus  (8PSK)  as  well  as  knowledge  of  the  first  T  =  2  symbols 
for  each  source,  more  than  sufficient  for  identihability  and  FIM  regularity  (theorem 
4.19). 

An  initial  estimate  is  obtained  using  the  zero-forcing  (bias  reducing)  variant 
of  the  algebraic  constant  modulus  algorithm  (ZF-ACMA)  [71].  This  algorithm  is  a 
useful  tool  for  estimation  of  constant  modulus  source  parameters  in  short  data  length 
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experiments  (only  N  >  K 2  or  2 K  required),  but  has  no  means  of  incorporating  the 
training  side  information.  The  ZF-ACMA  estimate  is  projected  onto  ©/to  establish 
an  initialization  for  the  method  of  scoring  with  constraints.  The  step  size  rule  chose 
a{k)  _  2~m  £or  /}ie  ieast  positive  integer  m  satisfying  (3.37).  The  average  MSE  (per 
real  parameter  coefficient)  at  each  iteration  over  5000  trails  is  compared  with  the 
CCRB  for  each  source  in  figure  4.4. 


Figure  4.4:  Source  estimation  with  varying  Aif>. 


The  mean-square  error  improvement  by  utilizing  the  complete  side  information 
in  scoring  maintains  efficiency  with  respect  to  the  constrained  CRB  for  moderately 
separated  angle  of  arrivals  compared  with  ZF-ACMA.  As  should  be  expected,  the 
estimation  performance  degrades  as  the  primary  source  angles  overlap,  but  even  in 
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the  worst  case  scenario  with  Aip  =  0°  of  separation,  the  approximate  corresponding 
8 PSK  bit-error-rate  (BER)  for  ZF-ACMA  and  scoring  with  constraints  is  .2914  and 
.0254,  meaning  the  estimation  schemes  result  in  bit  decision  errors  roughly  29%  and 
3%  of  the  time,  respectively.  The  departure  of  the  estimation  performance  from 
the  CCRB  as  A?/;  approaches  0°  is  possibly  due  to  a  loss  of  unbiasedness  in  the 
estimation. 

4.2  Calibrated  Array  Model 

The  narrowband  (calibrated)  array  model  may  be  written  as 

K 

Um(n)  =  22  am(^fc)7fcS(fc) (n)  +  wm(n)  (4.19) 

k= 1 

for  n  =  1, . . . ,  N  and  m  =  1, . . . ,  M,  where  s^k\n)  is  the  value  of  the  A; 1 1  i  input 
source  at  time  index  n,  is  the  complex-valued  channel  gain  for  the  kth  input, 
arn {oJk )  is  the  mth  sensor  response  to  the  kth  input  source,  Uk  is  the  angle-of-arrival 
(AOA)  of  the  A; 1 1  i  source,  and  wm(n )  is  the  noise,  modeled  as  zero-mean  circular 
Gaussian  with  variance  a2  iid  in  both  time  and  space.  In  vector-matrix  notation, 
the  model  for  each  time  slot  can  be  written  as 

K 

y(n )  =  22  a(UJk)rtkS('k'>  (n)  +  w(n)  =  A(w)Ts(n)  +  w(n)  (4.20) 

k= 1 

where  the  input  is  given  by  sT(n)  =  [s^^(n), . . . ,  s^(n)] ,  the  channel  gain  matrix 
is  T  =  diag  (71, ... ,  7 k),  and  the  response  matrix  is  given  by 

A(w)  =  [a(cni)  a(uj2 )  •  •  •  a(uK)]  (4-21) 
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ev  • 

-  , ^  sensors 


(a)  uniform  linear  array 


(b)  uniform  circular  array 


Figure  4.5:  Calibrated  array  model  geometries  for  (a)  uniform  linear  and  (b)  circular 
arrays. 

with  the  array  response  vectors  a  (a ;*,)  depending  on  the  physical  design  of  the  array 
and  the  AOA.  Calibration  posits  this  known  array  geometry.  The  examples  consid¬ 
ered  in  this  section  consist  of  the  uniform  linear  array  (ULA)  and  uniform  circular 
array  (UCA),  as  shown  in  Figure  4.5.  For  example,  for  the  ULA  the  response  vectors 
typically  have  a  Vandermonde  vector  structure  with  base  e-in sm(uk) _ 

4.2.1  The  Fisher  information  of  the  calibrated  array  model 

4.2. 1.1  Indirect  derivation  of  the  FIM 

As  was  mentioned  in  section  4.1,  the  calibrated  array  model  can  be  a  special 
case  of  the  convolutive  mixture  model  using  constraints.  The  method  for  transform- 
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ing  the  model  is  to  keep  the  parameters  for  the  instantaneous  mixing  model  and  to 
the  parameter  vector  add  extra  parameters  for  the  calibrated  array  and  channel  gain. 
When  evaluating  the  FIM,  the  elements  in  the  rows  and  columns  corresponding  to 
these  extra  parameters  are  zero.  The  constraint  then  represents  the  model  reparam¬ 
eterization,  e.g.,  for  this  case,  we  choose  the  constraints  which  define  H  =  A(w)r, 
element  by  element.  The  resulting  CCRB  submatrix  corresponding  to  the  extra 
parameters  will  be  equivalent  to  the  CRB  of  those  parameters.  This  procedure  is 
made  clear  in  the  following  example. 


Example  4.21.  Assume  x  ~  A f(ab,  1).  The  FIM  for  0T  =  [a,  b]  is  1(0)  = 
which  is  singular.  Suppose  we  wish  to  reparameterize  the  model  replacing  ab  with 
c.  This  is  equivalent  to  the  constraint  f(0*)  =  ab  —  c  for  the  expanded  parameter 
vector  0*T  =  [ a,b,c] .  For  this  orthonormal  complement 


b2  ab 
ab  a2  ’ 


U(0*)  = 


1  0 
0  1 
b  a 


of  the  Jacobian  of  the  constraints,  then  UT(0*)I(0*)U(0*)  =  1(0),  which  is  still  sin¬ 
gular.  Hence  CCRB(0*)  =  U (0*)P (0)UT (0*)  where  the  pseudoinverse  is  P(0)  = 


or 


CCRB(0*)  = 


ab 

.2 


b3  +  a2b 


'  b2 

ab  a ‘ 

b3  +  a2b  ab2  +  a3  64  +  2 a2b2  +  a4 


ab2  +  a3 


The  component  corresponding  to  CCRB(c)  =  j^i^xyi(bA  +  2 a2b2  +  a4)  =  1,  which 
agrees  with  the  value  of  CRB(c)  for  the  model  x  ~  M(c,  1). 


Note  that  the  original  model  need  not  be  identifiable,  nor  does  the  replace¬ 
ment  model.  Additional  constraints  under  either  model  can  also  be  included  in  the 
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constraint  function.  The  example  verifies  that  even  for  the  more  difficult  case  it 
is  possible  to  find  the  CRB  and  CCRB  for  the  calibrated  array  model  indirectly 
from  the  instantaneous  mixing  model  using  constraints.  However,  in  the  interests 
of  clarity,  in  the  next  section  the  FIM  and  CCRB  will  be  derived  for  the  calibrated 
array  model  directly. 


4.2. 1.2  Direct  derivation  of  the  FIM 

The  calibrated  model  in  (4.19)  has  a  likelihood  given  by 


p{y(i),...,y(N);0) 

1 

exp 


N 


Y  ( yin )  -  A(6)Fs(n))H  ( y(n )  -  A(0)rs(n))  .(4.22) 


(it  a 


2\MN 


u 


n=  1 


For  clarity,  since  there  exists  a  mixture  of  complex  and  real  parameters  requiring 

estimation,  then  using  the  parameter  vector7 

Re(s(l)) 


e  = 


Im(s(l)) 

Re(s(iV)) 

Im(s(iV)) 

7 

<JL> 


(4.23) 


with  j1  =  [71, ... ,  7 k\  and  ud 

=  [wi,.. 

■  ■  >  VK]  , 

the  Fisher  information 

given  by 

"  M 

0  ■■■ 

0 

Mi 

Lx 

0 

M 

0 

m2 

l2 

i(d)  = 

0 

0  ■■■ 

M 

mn 

Ln 

MT 

Ml  ••• 

Ml 

K-y 

L 

Ll 

tT 

^2 

Ll 

LT 

(4.24) 


7It  is  also  possible  to  include  the  noise  variance  parameter  a2  in  the  parameter  vector.  However, 

4 

this  parameter  decouples  from  the  other  parameters  resulting  in  an  optimistic  CRB  of  [16,  59], 
and  so  is  uninteresting  to  the  results  herein. 
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where 


M  = 


Re(A4)  —  Im(Ai) 
Im(Ai)  Re(A'l) 


,  M  =  —TuAh(ui)A(ui)T, 


az 


Mn  = 


Re(A/tn)  —  Im(A'l 
Im(A4„)  Re  (AX 

R  e(Cn 
Im  (Cn 


,  Mn  =  ^HAH(ui)A(ui)S{n), 
az 


h  I  j 


,  Cn  =  -r^(W)R(a;)r5(n)) 


cr 


X  = 


Re(£)' 

Im(£) 


iV 


,  £  =  -VSH(n)A"M£)(u)rS(n), 

(Tz  z— ' 


»i=l 


Kh  = 


Re(/C7)  — Im(/C- 
Im(X^)  Re(/Cn 


AT 


AT 


and  =  —  Re  {SH{n)THDH(ui)D(ui)TS^n))  . 

n=l 

In  the  equations  above,  we  redefine  S(n)  =  diag  (sR^(n), . . . ,  s^(n))  and 


(4.25) 

(4.26) 

(4.27) 

(4.28) 


•  ^  =  (4-») 
n=  1 


(4.30) 


da{u>i)  da{u2)  da{ojK) 

duii  dui2  duiK 


4.2. 1.3  Properties  of  the  FIM 


The  model  in  (4.19)  admits  an  ambiguity  as  7 kS^k\n)  is  indistinguishable  from 


for  any  nonzero  c*,  G  C.  From  Section  2.2,  then  it  should  be  expected 


that  the  FIM  in  (4.24)  is  singular.  This  is  indeed  the  case,  e.g.,  note  that 


'  M 

0 

0 

M 

0 

0 

1 

1-H  <N 

77 

1 - 

^  cnT 

to  to 

0 

Ml 

_C\ 
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Ml  • 
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And  since  the  FIM  consists  of  real  and  imaginary  parts  of  this  matrix,  it  too  has  a 


null  space,  namely,  the  columns  of 


N{0) 


Re(5(l)) 

— Im(5(l)) 

Im(S(l)) 

Re(S(l)) 

R  e(S(N)) 

— Im(S(A0) 

lm(S(N)) 

Re(S(N)) 

— Re(r) 

Im(r) 

-Im(r) 

— Re(r) 

0 

0 

are  a  basis  for  the  null  space  (or  at  least  a  null  subspace)  of  1(0).  So  the  nullity, 
or  dimension  of  the  null  space,  of  1(0)  is  at  least  2 K.  In  fact,  it  is  exactly  2 K 
(provided  N  >  M^K  for  reasons  similar  to  that  given  in  theorem  4.12). 


4.2.2  Constraints  for  the  calibrated  array  model 

4.2.2. 1  Constraints  on  the  complex- valued  channel  gain:  T  =  IrxK 

One  approach  to  eliminating  the  ambiguity  between  the  complex-valued  gain 
and  the  source  input  is  to  incorporate  this  gain  into  the  signal.  Instead  of  remodeling 
the  mean  of  (4.19)  to  be 

K 

/x(n,  0)  =  a(cuk)s (fe) (n)  (4.31) 

k= 1 

by  eliminating  the  unknown  gain,  it  is  equivalent  to  impose  the  constraints  7*,  =  1 
for  k  —  1, . . . ,  K .  The  model  in  (4.31)  is  a  model  presented  in  a  paper  on  “direction 
finding  with  narrow-band  sensor  arrays”  by  Stoica  and  Nehorai  [67].  Hence,  if 
the  theory  of  Chapter  3  is  to  be  trusted,  then  results  found  by  imposing  proper 
constraints  that  T  =  Ikxk  (or  7  =  lx)  should  be  equivalent  to  the  results  of  Stoica 
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and  Nehorai.  The  K  constraints  of  a  complex-valued  parameter  7  are  2 K  constraints 
of  the  corresponding  real- valued  parameters,  i.e.,  the  vector  /  of  constraints  can  be 
defined  as 


fk(0)  =  Re(7fc)  -1  =  0 

fi<:+k{0)  =  Im(7fc)  =  0 


for  k  —  1, . . . ,  K .  For  the  parameter  vector  in  (4.23),  the  Jacobian  of  these  con¬ 
straints  is  given  by 


F(6)  —  [02k 


x2  KN  J-2KX2K 


O2  KxK  ■ 


Since  F(9)N(0)  =  I2KX2K,  then  by  theorem  3.23,  this  constraint  is  sufficient  to 
(locally)  identify  the  parameters  and  by  theorem  3.24  the  matrix  UT(0)I(0)U(0) 
will  be  regular  for  any  matrix  U(0)  satisfying  (3.6).  An  orthonormal  basis  for  the 
null  space  of  the  Jacobian  would  be  the  columns  of 


U(0)  = 

and  this  generates  a  reduced  “FIM 


I2KNX2KN  O2KNXK 
®2Kx2KN  02KxK 
0kx2KN  I  KxK 


uT{d)i{e)U{e )  = 


M  0 
0  M 


0  0 
L\  L\ 


0 

0 

M 

T  T 
^ N 


L 1 
L2 

Ln 


which  is  equivalent  to  the  Fisher  information  in  [67,  equation  (E.9)].  Since  the  7 
parameters  are  known  and  therefore  it  is  unnecessary  to  understand  the  performance 
potential  (or  CCRB)  of  a  known  parameter,  it  is  only  of  interest  to  have  a  bound 
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on  the  performance  of  estimators  of  the  transformation 


Re(s(l)) 

Im(s(l)) 


a  =  k(0) 


Re(s(N)) 

lm(s(N)) 


L  w  J 

which  has  the  Jacobian  -^tOl  =  UT{6).  Hence,  CCRB(ck)  =  (UT (0)1(0)17(0)') 
is  the  same  as  the  CRB  found  by  Stoica  and  Nehorai.  This  equivalence  serves  as 
further  validation  of  the  CCRB  approach. 


4. 2. 2. 2  Semiblind  constraints:  s(t)  =  pit )  for  f  £  T 

An  alternative  approach  to  eliminating  the  ambiguity  between  the  source  and 
coefficient  is  the  have  prior  knowledge  of  some  of  the  source  signals.  These  known 
elements  are  often  referred  to  as  training  or  pilot  symbols  in  communications.  Knowl¬ 
edge  of  any  kth  source  element  s^k\t)  =  p^k\t)  at  any  time  sample  t  resolves  the 
ambiguity  between  7 k^k\n)  for  all  n  since  7*,  is  solvable  in  the  observation  corre¬ 
sponding  to  time  sample  t  and  can  thus  be  used  to  solve  the  unknown  source  values 
when  n  ^  T.  This  model  can  be  written  as 

K 

A t(n,  0)  =  'Yl  aM  {'fkP{k)  in) <JneT  +  7fcS(fc)  (™)<W r)  (4.32) 

k= 1 

where  (^statement  =  1  when  the  statement  is  true  and  =  0  when  the  statement  is  false. 
This  model  is  equivalent  to  the  model  designed  by  Kozick  and  Sadler  [39,  equation 
(9)]  except  with  7 kS^(n)  being  simply  s^k\n),  a  distinction  that  still  allows  for  a 
match  of  results  of  the  CCRB  of  a  properly  chosen  transformation.  The  model  is 
also  equivalent  to  the  model  designed  by  Li  and  Compton  [41,  equation  (2)]  when 
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T  =  {1,2, .. .  ,N}.  Equivalence  of  the  reparameterized  CRBs  in  [39,  41]  with  the 
CCRB  can  be  found  in  [59].  As  in  the  previous  example,  the  CCRB  on  the  unknown 
parameters  is  the  inverse  of  the  (unconstrained)  FIM  after  the  elimination  of  its  rows 
and  columns  corresponding  to  the  known  or  specified  parameters. 

4. 2. 2. 3  Finite  alphabet  constraint:  s^k\n)  G  § 

This  constraint  derives  from  the  assumption  that  the  source  elements  exists  in 
a  (finite)  discrete  set  S.  In  communications  models,  this  corresponds  to  digital  mod¬ 
ulation  designs  such  as  pulse  amplitude  modulation  (PAM),  quadrature  amplitude 
modulation  (QAM),  phase-shift  keying  (PSK),  etc.  As  such,  the  model  can  also  be 
the  same  as  any  of  the  previous  models  in  (4.19),  (4.31),  or  (4.32),  but  the  defining 
characteristic  is  that  the  source  samples  only  exist  on  a  discrete  set.  There  do  not 
exist  any  CRB-type  bounds  for  this  model  in  the  literature  due  to  the  problem  of 
differentiability  with  respect  to  the  parameters.8  It  is  certainly  possible  to  constrain 
any  single  real-valued  parameter  to  a  discrete  set  by  creating  a  polynomial  (or  sine 
function)  whose  zeros  match  the  set  values.  But  what  information  can  be  gained 
from  a  constraint  formulation  of  this  discrete-alphabet  model?  This  is  perhaps  best 
answered  with  the  following  example. 

Example  4.22.  Reconsider  the  model  in  example  2.2,  y  ~  CA/"(d,  a2).  Suppose  the 

8 A  Chapman-Robbins  or  Barankin-type  bound,  which  does  not  require  differentiability  with 
respect  to  the  parameters  would  be  possible  but  even  this  approach  does  not  seem  to  exist  in 
the  literature.  Many  communications  engineers  also  discount  the  importance  of  a  mean-square 
error  bound  for  a  digital  signal  and  rely  an  alternative  performance  criteria,  such  as  bit-error  rate 
(BER). 
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parameter  is  required  to  satisfy  the  constraints 


hW  =  (ReW)2-t  =  0 

k(t>)  =  (Im(tf))2  -1=0. 


This  constraint  restricts  the  real  and  imaginary  part  of  9  to  reside  in  the  discrete 
set  i.e. 


9  e  § 


V2  .y/2  V2  .V2  V2  .V?  V?  V?) 

—  +  . -~  + 1~  |  ■ 


(In  the  communications  vernacular,  this  is  quadrature  PSK  or  4-QAM.)  The  Jaco¬ 


bian  for  this  constraint  is 


F(0) 


2Re(d)  0 
0  2Im(d)  ’ 


which  is  full  column  rank  for  the  possible  set  of  values  for  d.  Hence,  to  satisfy  (3.6), 
U (9)  =  [  }  is  a  2  x  0  null  matrix. 


This  is  analogous  to  the  determinate  case  of  example  3.14,  therefore  knowledge 
that  a  parameter  exists  in  a  discrete  set  is  equivalent  to  complete  knowledge  of  the 
parameter  value  in  terms  of  mean-square  error  performance  potential-the  result 
being  a  Cramer-Rao  bound  of  zero.  This  does  not  mean  that  the  mean-square 
error  will  be  zero  (the  decision  of  which  set  value  the  parameter  actually  is  can  be 
wrong  given  sufficient  power  in  the  noise),  but  it  does  mean  that  the  mean-square 
error  bound  is  trivial  and  not  particularly  helpful.  This  result  is  not  surprising 
considering  the  degrees  of  freedom  of  the  parameter  that  are  restricted  from  such 
constraints.  The  restriction  of  a  real-valued  parameter  to  satisfying  a  single  root 
equation  eliminates  its  single  degree  of  freedom. 
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4. 2. 2. 4  Unit  modulus  constraints:  \s(n)\  =  1  for  all  n 


Since  knowledge  of  parameters  from  a  discrete  (finite)  alphabet  results  in  a 
trivial  bound,  then  to  obtain  a  relevant  and  useful  measure  of  performance  potential 
a  relaxation  of  the  side  information  is  necessary.  One  such  example  of  this  approach 
is  using  a  constant  or  unit  modulus  constraint  on  the  source  elements  as  in  section 
4. 1.4.4  for  the  convolutive  mixture  model.  This  approach  remodels  the  mean  as 

K 

/r(n,  0)  =  a(a;fe)7fee^(fc)(n).  (4.33) 

k= 1 

This  is  the  constraint  considered  in  example  3.4  applied  to  the  communications 
context.  Therefore,  imposing  the  constraints 

f(n-i)K+k(e)  =  |s(fc)(n)|2-l  =  0, 


for  k  —  1, _ ,  K,  n  —  1, . . . ,  N,  is  an  alternative  approach  than  rederiving  the  Fisher 


information  for  the  model  in  (4.33).  The  Jacobian  for  this  constraint  has  the  form 


F(0) 


2Re(S(l))  2Im(S(l)) 


0rx3K 

2R e(S{N))  21m  (S(N))  0  Kx3K 


(4.34) 


which  has  a  null  space  generated  by  the  columns  of  the  matrix 

-Im(5(l)) 

Re(S(l)) 


U(G)  = 


—1m(S(N)) 

R  e(S(N)) 

I3KX3K 


(4.35) 


From  this  we  can  check  the  (local)  identifiability  of  this  model  under  the  (unit) 
constant  modulus  constraint  using  theorem  3.24.  The  matrix  UT(G)I(G)U(G)  is 


rRe(S*(l)A/tS(l)) 


Im(S*(l)A/ti) 


Re(S*(l)A4i)  Im(S*(l)A)  1 


-Im(A4f  S(l)) 
Re(AtfS(l)) 


Re(S*(N)M.S{N)) 
-Im(A 'lftS(N)) 
Re(A A%S(N)) 

— Im  (C*S(N)) 


Im(S*(V)  M.n) 
Re(Kg) 

Im(  fc-y ) 

R  e{CH) 


Re(S*(V)A4jv) 

-Im(JCy) 

Re(Kg) 

-lra(CH) 


lm(S*(N)CN) 
R  e(C) 

Im  (C) 
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Since 


IrxK 


UT (8)1(9)17(9)  ■  Ik*k  =  0, 

— im(I  ) 

Re(r) 

OkxK 

then  U 1  ( 0)1(6)11(0 )  is  singular  and  by  theorem  3.24  the  (unit)  constant  modulus 
constraints  are  not  sufficient  for  identihability.  This  was  also  the  case  for  the  convo- 
lutive  mixture  model  in  section  4. 1.4.4.  Furthermore,  reviewing  the  model  in  (4.33), 
this  result  has  more  reason  to  be  expected.  The  original  identihability  issue  in  the 
calibrated  model  (4.19)  is  the  multiplicative  ambiguity  between  the  sources  and  the 
channel  gain.  While  the  constant  modulus  constraint  resolves  any  amplitude  ambi¬ 
guity  of  the  channel  gain,  i.e. ,  |yfc|  =  |7fc<s^(n)|  for  any  n,  there  still  exist  a  phase 
rotation  ambiguity.  It  is  clear  from  both  the  model  and  from  theorem  3.23,  that  for 
each  source  k  knowledge  of  an  element  for  either  the  real  or  imaginary  part  of  either 
the  channel  gain  or  a  source  sample  will  be  sufficient  for  identihability  and  a  regular 
UT (Q)I(0)U(0).  Of  course,  given  this  constant  unit  modulus  constraint,  knowl¬ 
edge  of  the  real  (imaginary)  part  of  any  source  sample  is  equivalent  to  restricting 
the  imaginary  (real)  part  of  the  source  sample  to  a  finite  discrete  alphabet,  which 
as  discussed  in  section  4. 2. 2. 3  is  the  same  as  knowledge  of  the  parameter  in  regards 
to  the  CCRB  performance  potential  but  not  in  regards  to  the  estimation.  Hence  for 
estimation  performance  it  is  a  necessity  to  constrain  the  real  and/or  imaginary  part 
of  the  source  sample. 
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4. 2. 2. 5  Unit  modulus  constraint;  real-valued  channel  gain:  Im(7fc)  = 
0  for  all  k 

This  model  is  that  given  in  (4.33)  except  with  7*,  being  real- valued.  As  seen 
in  sections  4. 2. 2. 2  and  4. 2. 2. 3,  the  constraint  is  knowledge  of  the  imaginary  parts  of 
7  and  the  effect  of  the  corresponding  U  ( 0 )  matrix  is  the  elimination  of  the  columns 
and  rows  corresponding  to  the  Im(7fc)  parameters.  This  model  is  equivalent  to  that 
of  Leshem  and  van  der  Veen  [40] .  Verification  that  the  CCRB  is  equivalent  can  be 
found  in  [59].  The  bound  is  essentially  the  inverse  of  the  reduced-parameter-space 
Fisher  information  Ur (6)1  (6)U (6)  from  section  4. 2. 2. 4  after  the  elimination  of  the 
rows  and  columns  corresponding  to  the  channel  gain  parameters. 


4. 2. 2. 6  Semi-blind  and  unit  modulus  constraint 


This  model  merges  the  models  in  (4.33)  and  (4.32).  Without  loss  of  generality, 


assume  the  source  elements  are  known  for  the  first  T  time  slots  for  each  source.  Then 


the  constraint  Jacobian  is  F(0)  = 

I”  IlTKx2TK  02TKX3K 

I  2Re(S(T  +  1))  2Im(S(T  + 1))  0Kx3K 


2R e(S(N))  2lm(S(N))  0Kx3K  J 

(4.36) 


which  has  an  orthonormal  complement 

O2TKXK 

— Im(5(l)) 

Re(S(l)) 

U(0)  —  .  (4.37) 

-lm(S(N)) 

R  e(S(N)) 

IsKx3K 
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The  reduced  FIM  is  then 


U('9)I(O)U(0) 


G(T  +  1) 

G(T  +  2) 


CT(T  +  1)  CT(T  +  2) 
Bt(T  +  1)  St(T  +  2) 


C(T  +  1)  B(T  +  1) 
C(T  +  2)  B(T  +  2) 

G(JV)  C(JV)  B(7V) 

CT(N )  X 

Bt(N)  lt  k„ 


where  G(f)  =  Re[S,*(f)A4.S,(f)]  and  Ad  is  defined  as  in  (4.25),  where  C(t )  = 
[lm[S'*(f)A4t]  Re[£*(£)A4t]]  and  Adn  is  defined  in  (4.26),  B(t )  =  Im[S,*(i)Xt] 
and  /If  is  defined  in  (4.27).  To  analytically  invert  this  matrix,  the  Schnr  comple¬ 


ment  , 


$  = 


N 


N 


K^-  CT{t)G^{t)C(t )  L-  Yl  CT(t)G~1(t)B(t) 


t=T+ 1 
N 


t=T+ 1 
N 


LT-  BT(t)G-\t)C(t)  Ku-  Y  BT(t)G~1(t)B(t) 


t=T+ 1 


t=T+l 


(4.39) 


is  useful.  If  4>  is  partitioned  into  corresponding  subblocks 
CCRB  subblocks  for  the  unknown  elements  are  given  by 


d>K.f 


,  then  the 


CCRB  (w ) 


CCRB(7)  = 


CCRB( 


Re(s(f)) 

t  - 

-Im  (5(f))' 

Im(s(f)) 

)  ~ 

R  e(S(t)) 

X(t)  [— Im (S(t))  Re(S(t))] 


where  X(t)  =  G~\t )  +  G~\t)  [( C(t )  B(t )]  S"1 


\CT(t )1 
[BT(t)  \ 


G~\t). 


Example  4.23.  A  distinct  advantage  of  the  CCRB,  as  implemented  in  this  treatise, 
is  the  ability  to  compare  the  performance  potential  of  a  number  of  different  models 
seamlessly.  Suppose  we  consider  M  —  5  omni-directional  sensors  with  a  beamwidth 
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0.7 


CRB  ON  SOURCE  2  AOA 


SOURCE  2  AOA  (DEGREES) 


Figure  4.6:  CCRBs  on  AOA  for  blind,  constant  modulus,  and  known  signal  models. 

of  ps  23.6°  in  a  uniform  linear  array  receiving  K  =  2  source  signals  over  N  =  100  time 
samples.  Then,  it  is  possible  to  compare  various  communications  design  scenarios, 

e-g-, 

(a)  the  “blind”  case9:  Sfc(l)  known  for  k  —  1,2, 

(b)  the  unit  modulus  case:  |sfc(f)|2  =  1  for  k  —  1,  2  and  t  =  1, . . . ,  100, 

(c)  the  semiblind  case:  Skit)  known  for  k  —  1,  2  and  t  —  1, . . , ,  T  =  20, 

(d)  the  unit  modulus  and  semiblind  case:  Sk(t )  known  for  k  —  1,2  and  t  = 
1, . . . ,  T  =  20,  and  |sfc(f)|2  =  1  for  A;  =  1,2  and  t  =  T  +  1, . . . ,  100,  and 

(e)  the  known  signal  case:  Sk(t)  known  for  k  —  1,2  and  t  —  1, . . . ,  100. 

9  This  is  not  a  truly  blind  scenario,  but  is  often  referred  to  as  blind  in  the  literature. 
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CRB  ON  SOURCE  2  AOA 


Figure  4.7:  CCRBs  on  AOA  for  semiblind,  constant  modulus  +  semiblind,  and 


known  signal  models. 
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Figure  4.6  displays  a  comparison  between  blind,  unit  modulus  constraints,  and 
known  signal  model  designs  of  the  CCRBs  on  angle-of-arrival  (AOA)  estimation  for 
the  second  source  signal  over  varying  directions  and  different  signal-to-noise  ratios 
(SNRs)  with  the  first  source  signal  arriving  at  0°.  Figure  4.7  displays  a  comparison 
of  CCRBs  on  AOA  estimation  between  semiblind  constraints,  unit  modulus  with 
semiblind  constraints,  and  known  signal  model  designs.  And  finally,  figure  4.8  dis¬ 
plays  CCRBs  on  signal  phase  estimation  between  blind,  unit  modulus  constraints, 
semiblind  constraints,  and  a  mixture  of  unit  modulus  and  semiblind  constraints. 
The  known  signal  model  is  the  best  case  scenario  for  AOA  estimation  potential 
and  is  a  useful  guideline  for  more  desirable  scenarios  where  information  (data  or  un¬ 
known  parameters)  is  included  in  the  transmission.  Figures  4.6  and  4.7  demonstrate 
the  characteristic  loss  of  performance  when  the  sources’  AOAs  differ  by  roughly  the 
beamwidth.  The  value  of  the  unit  modulus  constraint  is  evident  when  the  sources’ 
AOAs  are  closely  spaced  as  the  CCRB  performance  potential  approximates  that  of 
the  known  signal  model.  In  figures  4.7  and  4.8,  the  estimation  potential  actually 
improves  for  closely  space  sources  with  the  semiblind  and  unit  modulus  constraint 
mixture. 

4.3  Discussion 

This  chapter  includes  extensions  of  only  a  brief  sampling  of  my  prior  research 
[59,  39,  48,  51,  39,  48,  59,  49]  as  it  relates  to  the  practical  application  of  the  CCRB. 
In  this  chapter,  the  convolutive  mixture  model  and  the  calibrated  array  model  were 


133 


MEAN  CRB  ON  SOURCE  2  SIGNAL  PHASE 


Figure  4.8:  CCRBs  on  signal  phase  for  blind,  constant  modulus,  semiblind,  and 
constant  modulus  +  semiblind  models. 
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treated  as  base  models  for  which  the  Fisher  information  was  derived  a  single  time 
and  then  a  series  of  variations  on  the  models  in  the  form  of  differentiable  parametric 
constraints  were  considered.  This  approach  presents  a  simple  procedure  to  compare 
and  contrast  a  large  class  of  constraints,  essentially  different  models,  in  an  efficient 
manner  to  determine  the  value  of  particular  formulation  in  terms  of  performance 
potential  as  measured  in  the  CCRB. 
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Appendix  A 


Appendices 

A.l  A  proof  of  the  CCRB  using  the  Chapman- Robbins  version  of 
the  Barankin  bound 


Gorman  and  Hero  developed  a  CCRB  using  the  multiparameter  version  of 
the  Hammersley-Chapman- Robbins  bound  (HCRB)  [16,  26].  However,  the  result 
produced  a  variant  form  of  the  CCRB 

r\d)  -  r\G)FT(e )  (. f(0)i~1(0)ft(0))~ 1  F(d)r\e ), 


which  requires  a  nonsingular  FIM.  What  follows  is  a  shorter  variation  of  their  ap¬ 
proach  that  does  not  assume  a  nonsingular  FIM,  starting  with  a  brief  description  of 
the  HCRB. 


Rather  than  relying  on  the  Fisher  score,  which  is  the  derivative  of  the  log- 
likelihood,  i.e., 


s(x- 0)  =  loE „IX-  9)  =  1 

1,1  gglog  p[x,9)  gg 

the  regularity  conditions  requiring  a  differentiable  likelihood  can  be  relaxed  by  con¬ 
sidering  finite  differences,  i.e.,  for  each  i  =  1, ...  ,  m, 

1  p(x]0  +  6iei)  ~p(x;d) 


P(X]0)  6i 

where  the  e«  are  canonical  unit  vectors.  If  the  likelihood  is  differentiable  then  the 
limit  as  each  e*  — >  0  is  the  Fisher  score.  Of  course,  the  finite  differences  need  not 
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be  with  respect  to  the  canonical  axis  and  the  number  of  finite  differences  need  not 
be  the  same  as  the  dimension  of  the  parameters.  This  is  the  generalization  of  the 
CRB  known  as  the  Hammersley-Chapman-Robbins  (HCR)  version  of  the  Barankin 
bound. 


Theorem  A.l.  If  t(x)  be  an  unbiased  estimate  of  h{9)  e  h(Wm)  C  Id,  and  if)  = 
is  a  matrix  whose  columns  are  test  points  in  Mm,  all  distinct  from 
each  other  as  well  as  from  9,  then  the  variance  of  t(x)  is  bounded  below  by  the 


inequality 


Var(t(aj))  >  sup  A(0,-0)Y  1(0,  xj))  AT(0,  ij>) 


where  A (0,  i/>)  is  called  a  translation  matrix  defined  by 


A (Oj'if))  =  —  h(d),  •••,  h(i/}^)  —  h(0)\ 


and  Y(0,  -0)  is  called  an  HCR  information  matrix  defined  by 


T  ij{9,xl>)  =  Ee 


p(x ;  i/jd))  —  p(*;  0)  p(x]  ifd))  —  pfx]  0) 


p(x-,0) 


p(x;9) 


This  result  encompasses  the  CRB  result  when  the  test  points  satisfy  i/d*)  — 
9  +  e*ej  and  the  — >  0  for  i  —  1, . . . ,  m  —  p,  i.e.,  as  a  properly  chosen  set  of  test 
points  approach  the  parameter.  If  the  set  of  vectors  -i/d1)  —  9, ... ,  i/h'C  _  Q  span 
an  m-dimensional  space,  then  the  limit  as  the  hnite  differences  approach  derivatives 
will  still  obtain  the  CRB. 

Since  f(9)  =  0  it  makes  sense  to  restrict  the  test  points  to  also  satisfy  the 
constraints,  /(i/d*))  —  q,  and  examine  the  limit  of  the  HCRB  as  the  hnite  differences 
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approach  derivatives.  The  Taylor  series  approximation  of  /(i/A))  about  0  is 


/(V(i))  =  f(B)F(e)  (v><4)  -  e)  +  o(||v-(i)  -  fl||). 

Since  /(i/A))  =  /  (0)  =  0,  then  i/d  —  0  almost  entirely  resides  in  null (F(0)).  So 
without  loss  of  generality  we  can  allow 

if>®  =  0  +  6iUi(0)  +  o(  ||^W-6»||) 


where  Ui(0 )  is  the  ith  column  of  the  matrix  U(0)  satisfying  (3.6).  Then 


p(x;  _  p(cc;  0)  p(x;  0  +  5iUi(0 )  +  o( ||i/>W  -  0||))  -  p(x;  6) 


5ip(x\  0) 


uj(0)dp{x’e) 


80  p(x;0) 


Sip(x;  0) 

=  uj{0)8(x;0) 


and 

=  u,(9)  +  io( HV>(i)  -  f  II)  -*  11,(9) 

Oi 

as  Si  — >  0.  This  is  true  for  any  i,  so  if  the  test  points  are  chosen  such  that  each 
column  of  U  ( 0 )  is  used,  this  gives  us  the  CCRB 


U(0)  (UT(0)I(0)U(0))  1UT(0 ) 


as  the  limit  of  a  constrained  HCRB  when  the  finite  differences  become  derivatives. 


A.  2  A  proof  of  the  CCRB  using  the  method  of  implicit  differentiation 

Suppose  0  is  restricted  to  the  zeros  of  /  :  Mm  — >  with  Jacobian  F(0)  = 

having  rank  k  whenever  f{0)  =  0.  The  method  of  implicit  differentiation 
assumes  that  parameters  that  would  be  eliminated  under  a  reparameterization  can 
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be  written  in  terms  of  the  remaining  parameters,  since  the  conditions  satisfy  the 
implicit  function  theorem.  The  actual  function  generally  remains  unknown,  but  it’s 
derivative  is  calculated  by  first  taking  partial  derivatives  of  constraint  function  and 
using  linear  algebra  to  solve  for  the  partial  of  the  eliminated  parameters  in  terms  of 
the  remaining  parameters.  This  approach  was  also  used  by  Marzetta  [47,  proof  of 


theorem  1]  to  prove  the  regularity  conditions  given  in  (3.20). 


The  parameter  vector  may  be  separated  as  6  =  Q  and  the  constraint  / 


may  be  rewritten  as  /*  : 


*  :  Wn~k  x  Rk 


via  the  mapping  f*(d1,02)  =  /(  „ 


Then  the  Jacobian  of  f  can  be  represented  as 


*W=  [/«*,(»  1,02)  /»*(«  1,02)] 

where  8.)  =  02)  for  each  i  =  1,2. 

Without  loss  of  generality,  assume  02  G  Rk  is  a  function  of  Q\  G  i.e., 

02  =  02{di)  is  an  implicit  function.  Therefore,  f*  is  implicitly  only  a  parameter  of 
6 1  and 

i,e2(»,))  =  nt(6  i,e2(«i))  + 

If  this  derivative  is  only  taken  where  f*(d i,  02)  =  0,  then  in  matrix  form, 

[h(e i,e2)  fa{#u»2)}  fesH  =  0 

ae1 

The  first  matrix  above  is  F(0);  the  second  matrix  above  consists  of  m  —  k  linearly 
independent  columns  which  exist  in  the  null  space  of  the  row  vectors  of  F(6),  i.e., 
the  second  matrix  is  merely  some  transformation  of  some  matrix  U{0)  defined  as 
in  (3.6). 
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A. 3  Alternative  proof  of  asymptotic  normality 

Crowder  [18]  proved  the  following  theorem. 

Theorem  A. 2  (Crowder).  If 

1.  there  is  a  consistent  solution  (Gn,  Xn)  of  the  likelihood  equations, 

2.  s(x;  d0)  ~  A^O,  I(0q)), 

3.  D~\dn)  (l(0 0)  +  ^r8{x;e0,en))  4  0, 

4.  Q(0n)  -  Q(G0)  A  0,  and 

5.  det  Q(0n)  <  K  <  oo 

6.  det  D~l(Gn)FT(Gn)  (F(6  ,Gn)D~l(Gn)FT  (Gn))~l  <  K  <  oo 

where  s(x;  Go ,  0)  is  a  matrix  in  the  form  of  the  Fisher  score  s(x;  G)  but  with  each  row 
evaluated  at  possibly  different  points  on  the  line  between  Go  and  Gn,  and  similarly 
for  F(0,0n),  and  Q(G)  =  D~1(G)FT(G)  (F(G0,  0)D~\0)FT(0))~1 .  Then 

y/n  (e  -  0)  4  M  (0,  D~1(G)  -  D~1{G)FT{G)  (F (0) D1  (0) FT (0))_1  F(0)D~1(0 )) 

where  D(G)  =  1(0)  +  FT (G)KF(G)  for  an  arbitrary  positive  semi-definite  matrix 

K 

Crowder’s  asymptotic  normality  theorem  shows  that  variance  of  the  CMLE 
satisfies 

Var(vA)  4  D-\0q)  -  D~\Go)FT(Go)  (F(G0)D-1(G0)FT(G0)y1  (6>0) 
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as  n  — >  oo,  where  D(6 )  =  1(9)  +  FT (O)KF(O).  This  asymptotic  variance  has  the 
exact  structure  as  the  Marzetta  form  of  the  CCRB  when  K  =  0  and  the  FIM  is  full 
rank.  Applying  the  algebraic  identity  of  Lemma  3.8,  then 

Var(v^n)  ^  U{0O)  {UT (90)D(90)U (0O))_1  UT(G0). 

It  only  remains  to  note 

UT(e0)D(90)U(e0)  =  Ut(90)I(90)U(90 ) 
since  F(6)U(6)  =  0. 
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Appendix  B 


Proofs  of  Convergence  Properties  of  Constrained  Scoring 

Theorem  (Theorem  3.33).  If  for  any  iterate  G^k>  G  0/  there  does  not  exist  an 

ofk>  >  0  that  satisfies  (3.37),  then  G^k>  is  a  stationary  point. 

Proof.  Let  G  G  0/  and  define  0(a)  =  tv  [G  +  aCCRB(0)s(a;;  0)].  By  a  property  of 
the  natural  projection  of  convex  sets, 


0(a)  -  0 


m 


<  a 


CCR  B(B)s(x;B) 


m 


Hence  it  is  sufficient  to  show  there  exists  an  a  >  0  such  that 


(log p(x;  0(a))  -  log p(x;  0)) 


a 


>  kst(x;G)CCRB(G)s(x]G). 


To  show  this  by  contradiction,  assume  not  and  take  the  limit  as  a  — >  0.  Then 


sT(a;;0)CCRB(0)s(a;;0)  <  kst (x~,  G)CCRB(G)s(x;  0) 


0  <  (k  -  l)sT(x]G)CCRB(G)s(x-,G). 


This  inequality  implies  CCRB(0)s(a;;  G)  =  0  and  s(x;G)  G  spa i\(FT(G))  since 
k  <  1,  so  G  satisfies  the  stationarity  condition  (3.32).  [  j 

Theorem  (Theorem  3.34).  The  sequence  {p(x-,  G^)}  is  a  monotone  increasing  se¬ 
quence.  Furthermore,  if  p(x;  ■)  is  bounded  above,  then  {p(x]  G^)}  converges. 

Proof.  Since  k  >  0  and  ||0(fc+1l  —  G^W ^(fc)  >  0,  then  by  the  rule  in  (3.37),  the 
value  of  the  likelihood  function  can  only  increase  after  each  iteration.  The  second 
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statement  is  a  consequence  of  the  monotone  convergence  principle,  i.e.,  a  bounded 
monotone  sequence  converges  [38,  p.  44,  theorem  2-6].  I  j 

Theorem  (Theorem  3.35).  If  the  likelihood  p(x;  •)  is  bounded  above,  then  the 
sequence 

{logp(ay  0<'k+1'>)  —  log p(x\  6^)} 

vanishes. 

Proof.  Since  {p(x;  0(fc))}  converges,  then  jy  — >  l.1  And  since  log(-)  is  contin¬ 
uous,  then  log  }  ->  0.  □ 

Theorem  (Theorem  3.36).  If  the  likelihood  p(x;  ■)  is  bounded  above,  then  the 
sequence 

{||e<*+»_e«lli(i(k))} 

vanishes  as  k  — >  oo. 


Proof.  Again,  by  the  rule  in  (3.37),  the  sequence  {||£hfc+b  —  0(fc)  || ^  (fc)  }  is  bounded 
above  by  the  product  of  a  bounded  sequence  {a®}  and  a  vanishing  sequence 
{log  p(x-,  6i-k+1'1)  —  log  p(x;0^)},  and  clearly  each  element  of  the  sequence  is  non¬ 
negative.  Hence,  by  the  squeezing  theorem2,  the  sequence  vanishes  as  k  — >  oo. 

□ 


Theorem  (Theorem  3.37).  If  is  compact  and  convex,  then  limit  points  of  the 

sequence  {6^}  are  also  stationary  points. 

1If  afc  — >  a,  bk  — >  b ,  and  bk  ^  0  for  any  k,  then  ^  ^  [38,  p.  41,  theorem  2-4(d)]. 

2If  0  <  Ofe  <  bk  and  bk  — ►  0,  then  a*,  — >  0  [38,  p.  43,  theorem  2-5(c)]. 
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Proof.  Let  6*  be  a  limit  point  of  the  sequence  Then  by  virtue  of  Bolzano- 

Weierstrass  [38,  p.  54,  theorem  2-14],  there  exists  a  convergent  subsequence  {6^} 
that  converges  to  0*.  Additionally,  since  {a^}  is  a  bounded  sequence,  it  contains  a 
convergent  subsequence  {a^ki^ }  with  a  limit  point  we  shall  denote  a*.  It  is  still  true 
that  Qikl:>  '1  — >  6 *  [38,  p.49,  theorem  2-10].  Then  we  can  bound  the  norm-distance 
between  n  [0*  +  a*CCRB(0*)s(a:;  0*)]  and  0*  using  the  triangle  inequality,  e.g., 


7r  [0*  +  a*CCRB(0*)s(;c;  0*)]  -  0*|| 


< 


7T  [0*  +  a*CCRB(0*)s(a:;  0*)]  -  tt  \d{ki^  +  a(fch)CCRB(0(fch))s(*;  0(^°) 


+ 


7 r 


+  a(feh ] CCRB (0(fch '>)s(x;  0(fch 


-  0(fcq) 

+ 

0(feh)  -  0* 

- 

7(0*) 

1(9*) 
1(0*) 


< 


0*  +  a*CCRB(0*)s(a;;  0*)  -  G{kii]  -  a{ki^CCRB(G(KiP)s(x)  0 


)(fen)' \c(~- 


1(0*) 


1(0*) 


+ 

0(fch+1)  -  0(fciA 

+ 

0(feh)  -  0* 

7(0*) 

The  second  inequality  is  a  result  of  a  property  of  projections  on  convex  sets  for  the 
first  term  and  the  definition  of  our  method  of  scoring  with  constraints  for  the  second 
term.  Note  this  second  term  will  vanish  by  theorem  3.36  as  k{j  — >  oo.  Also,  the 
third  term  will  vanish  as  kij  — >  oo  since  6 *  is  the  limit  of  the  sequence  {0l  /A-)  }.  The 
first  term  is  bounded,  using  the  triangle  inequality  again,  as  in 


0*  +  a*CCRB(0*)s(a:;  0*)  -  6{kii]  -  a{ki^CCRB(G(KiP)s(x)  0 


)(fen)w 


a*CCRB(0*)s(ai;  0*)  -  a(fe^)CCRB(0'Kld)s(x:  0 


7(0*) 


< 

0(feh)  -  0* 

+ 

7(0*) 

i(fen)  w™. 


7(0*) 


Again,  this  first  term  will  vanish  as  kt .  — »  oo.  This  last  term,  using  the  triangle 
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inequality,  satisfies 


a*CCRB(0*)s(cc;  0*)  -  a{ki^CCRB(e^>)s(x;  0 


no*) 


< 


a*CCRB(0*)s(x;  0*)  -  cWcCRB^Wx;  0* 

'  "no*) 

of  nr»*  of  nr*’ 


+ 


< 


cWCCRB(0*)s(®;  0*)  -  alS  JCCRB(0(^))s(a;;  0 
||CCRB(0>(®;0*)||/(*} 


7(0*) 


(&i)  * 

cr  v  —  a 


+ 


a<V 


7(0*) 


7(0*) 

CCRB(0(^))s(a;;  0(feh))  -  CCRB(0*)s(a:;  0* 


7(0*) 


The  second  inequality  used  the  distributive  property  of  norms  [19,  p.170,  theorem 
6.9.2],  By  compactness,  {||CCRB(0*)s(cc;  0*)||  j^*)}  is  bounded.  So  the  hrst  term 
above  will  vanish  as  ktj  — >  oo  since  a *  is  the  limit  of  the  sequence  {c/fclR}.  Similarly, 
{ || o!>kli -'ll 7(0*)}  is  a  bounded  sequence,  and  so  the  second  term  above  will  vanish  as 
kt]  — >  oo  since  CCRB(0l/'b'))  — ■>  CCRB(0*)  and  s(x;  0(fch-))  — ■>  s(x;  0*)  by  continuity 
[38,  p.78,  corollary  4-2],  Therefore, 


7T  [0*  +  cRCCR B(0*)«(®;  0*)]  =  0* 


and  one  of  the  following  holds: 
(a)  a*  =  0, 


(b)  CCRB(0*)s(aq  0*)  =  0,  or 

(c)  the  step  projection  0*  +  a*CCRB(0*)s(a;;  0*)  is  perpendicular  to  0/  at  0*. 

This  last  case  (c)  is  impossible  since  the  step  is  directed  by  linear  combinations  of 
the  vectors  of  U(0*)  which  are  tangent  to  the  constraint  space  at  0*.  This  hrst  case 
(a)  implies  stationarity  by  applying  continuity  on  the  step  size  rule  condition  (3.37) 
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and  then  theorem  3.33.  And  (b)  implies  s(x;  0 *)  is  some  linear  combination  of  the 
columns  of  FT(0*),  i.e.,  0 *  satisfies  (3.32).  Therefore,  0 *  is  a  stationary  point.  □ 

Theorem  (Theorem  3.38).  If  ©0(1)  is  compact  for  all  sequences  in  a  closed  set  of 
0/  and  if  there  is  a  unique  limit  point  0 *  for  all  such  sequences  then  lim  0(A:)  =  0* 

k — >oo 

for  every  sequence  Also,  0*  is  the  maximum  of  p(x;  •). 

Proof.  Let  0(2^  be  any  point  in  the  compact  set  ©0(1).  Since  {0^}  resides  in  a 
compact  set  it  has  a  limit  point  (Bolzano- Weierstrass  [38,  p.  52,  theorem  2-12]), 
which  must  be  unique  and  therefore  lim  0(A:)  =  0* .  By  theorem  3.34,  p(x]  0(fc+4>)  > 

k — >oo 

p(ay0(L>);  and  by  continuity  p{x ;  0(fc))  — >  p(x\0*).  Hence,  p(x]0*)  >  p(x;0)  for 
every  0  £  ©^o)  •  □ 
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Appendix  C 


Proofs  of  Theorems  in  Chapter  4 

Theorem  (Theorem  4.7).  The  CFIM  is  singular  and  the  dimension  of  its  null  space 
is  lower  bounded  as 

K  K 

nullity^))  >  E(ii  -  Li  +  1)+.  (C.i) 

*= 1  j  =  1 

where  (a)+  =  a  for  a  >  0  and  (a)+  =  0  for  a  <  0.  This  limit  quantity  is  the  nullity 
lower  bound  (NLB). 


Proof.  This  proof  is  by  construction.  We  shall  develop  a  null  subspace  of  the  sub¬ 
matrix  [Qi  Qj]  of  Q.  In  particular,  consider  the  submatrix  of  consisting  of  the 
ith  source  elements  and  j  channel  elements  corresponding  to  the  mth  channel,  i.e., 
.  There  exists  three  case  to  consider:  (1)  L*  =  Lj,  (2)  L*  >  Lj,  and 
(3)  Li  <  Lj.  But  first,  for  use  in  the  proof,  define  the  M(Lj  +  1)  x  (Lj  —  L*  +  1)+ 


C <(i)  itO) 
5  H(m) 


matrix 


H 


W 

L) 


n-Ai) 

nm 

(*) 

nV)* 


H 


W 

U)M  J 


(C.2) 
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where  ?i| is  the  (Lj  +  1)  x  (Lj  —  Lj  +  1)+  matrix 

h$(0)  0  •••  0 

/$(1)  h$(0)  ■■■  0 

h$(l)  •••  0 

^A])m  =  hm  (-hj)  I  •  •  •  hm  (0)  ■  (^-3) 

0  h^{Li)  •••  h$(l) 

0  0  •••  h%{Li)_ 

Then  H$)m  =  h%  and  7i^ ^  =  hA)  from  the  secondary  vector-matrix  model  in  (4.6) 

if  Li  =  Lj.  And  also  define  the  (N  +  Lj)  x  (Lj  —  Lj  +  1)+  matrix 

s^(-Lj)  S®(-Lj-  1)  •••  sW(-Li) 

cW_  a(i)(-£;  +  l)  •••  5«(-Ll  +  l) 

-  :  :  •.  : 

_sW(7V-l)  s®(N-  2)  •••  .s-W(.Y  -  h,  -  Lj  -  1) 

Then  <S|^  =  if  Lj  =  Lj.  Now  consider  the  cases: 


(1)  From  the  dual  interpretation  of  the  model  in  (4.3)  and  (4.6),  it  is  clear  that 
H?2\S^  =  S^hm]  hence, 


=0. 

rsu)l 

r  h®  i 

Also,  ( Lj  —  Lj  + 1)+  =  dim(  ^  )  =  1  unless  h^>  =  0  and  =  0,  in  which 
case,  nullity(  IM  <g>  )  >  (Lj  -  Lj  +  1)+. 


(2)  in  (C.3)  has  rank  (Lj  —  Lj  +  1)  unless  hm  =  0,  which  is  impossible 

by  definition.  Likewise,  is  full  column  rank  unless  has  fewer  than 
(Lj  —  Lj  +  1)  modes  or  N  <  Li  —  2 Lj  +  1  (see  theorem  4.3).  Therefore,  since 

IM®S^  H$  J)  =0 

rsu)l 

then  nullity(  JM  0  )  >  (Lj  -  Lj  +  1)+. 
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(3)  Finally,  nullity(  IM  ®  S®  )  >  0  >  (L*  -  Lj  +  1). 


□ 

Theorem  (CFIM  NLB  necessary  conditions,  Theorem  4.13).  The  M-channel  K- 
source  FIR  system  Fisher  information  matrix  has  a  nullity  of  exactly  the  NLB  in 
(4.13)  only  if 

(a)  H(z)  is  irreducible  and  column-reduced, 

K 

(b)  Ptotal  ^  ^  1  ^  ^  Lj , 

3= 1 

(c)  pk  >  Lk  +  2  for  k  =  1, . . . ,  K  or  >  1  if  =  0, 

K 

(d)  IV  >  K  +  J2Lji  and 

3= 1 

(e)  M  >  K. 

Proof.  If  any  of  these  conditions  fail,  then  the  null  space  of  X(i9)  is  greater  than  the 
NLB. 


s  otherwise  E&y  =  0  and  the  model  itself  is  not  identifiable.)  Then 

'ow 

v*  =  :  G  null(X(t?)) 

OW 

y{K). 
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where  0®  is  a  M(Lk  +  1)  length  zero  vector.  Assume  v*  is  a  linear  com¬ 
bination  of  the  columns  of  a r  in  (4.14).  Then  for  some  k  the  columns  of 


n(k) 


H 


(K) 

(fc) 


must  be  linearly  dependent.  (Otherwise  without 


this  assumption,  then  v*  ^  span  (A/").  This  corresponds  to  the  0®  subvector 


in  v*.)  Let 


(fc) 


W(fe) 


u 


u 


(2) 

(fc) 

(K) 
( k )  ■ 


G  null( 


'iy(l)  'iy(2) 

^(fc)  'T'(fc) 


W 

(fc) 


with  a  (Lfc  —  L*  +  1)+  length  vector.  Then 


r<n 


C/(fc)  - 


u, 


(2) 

(fc) 


L^(fc)  J 


G  null(JT 


M 


where 


ST °  •••  o 


ry(*)  — 
u  (fc)  “ 


(If  Lj  >  Lk  then  is  a  null  matrix.  Also,  =  u^IN+LkXN+Lk.)  The 
matrix  Uq.)  can  be  arranged  to  create  N  +  Lk  linearly  independent  columns  in 
the  null  space  of  X(i9),  where  the  submatrices  C/^.j  correspond  to  the  rows  of 


0  u 


( i)T 

(k) 


0 


u 


(i)T 
(fc)  . 


N-\-Li  xN-\-L^ 
r(fc) 


A/"  containing 


(0 


»(2) 

(i) 


;(*)' 

(0 


,  which  has  rank  at  most  (L3  — 


j= i 


A  +  l)+  <  ^^(Lj  + 1).  For  the  columns  of  A/”  to  be  a  basis  of  the  null  space  of 


3=1 


K  K 

X(t9),  it  is  needed  then  that  ^^(Lj  +  1)  >  fe+l)+  ^  AT+Lfc  >  IV+Lj, 

i=i  i=i 
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K 


which  implies  at  most  N  =  K  +  Lj  contradicting  theorem  4.12  unless  all 

l=i 

the  channel  orders  are  zero.  In  this  latter  case,  then  note  that 


/ 

r«(l)i 

M(fc) 

\ 

AT- 

Ikxk  <8) 

(2) 

W(*0 

V 

(AT) 

™  J 

/ 

will  not  be  full  rank,  and  hence  neither  can  a r  be  so. 


K 


(b)  If  ptotai  <  K  +  then  there  exists  v  £  null(S')  where  the  matrix  S 


k= 1 


[S^  S ^  ■  ■  ■  5(^] .  If  v  is  partitioned  as 


v  = 


(1) 

(2) 


V 


(K) 


where  v ^  is  a  L*,  +  1  length  vector,  then 

Imxi  ®  v(1) 
0(1) 


v  = 


Imxi  v(K) 

ow 


G  null(Z(0)) 


with  0^  is  an  N  +  Lj  length  zero  vector.  If  v*  £  span(A/"),  then  for  some 

K 


k  we  have  rank( 


o(l)  o(2) 


JK) 

W 


<  5>,  -  ^  +  1)+.  Then  by  a 
1= 1 

construction  similar  to  part  (a),  nullity(S')  >  Lk  +  1  and  this  contribution  to 


the  null  space  of  Z(i?)  has  rank  at  least  M(L +  1),  with  submatrices  that 


K 


coincide  with 


K 


•x/(  i) 

^(fc) 


W 

(fc) 


,  which  has  rank  at  most 


l=i 


L,  +  1)+  <  +  1)  =  i^(Lfc  +  1)  <  M(Lk  +  1)  for  any  L/c  since  M  >  K. 

l=i 


A'  A' 


Therefore,  nullity(Z(i?))  > 

•t=i  j=i 
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(c)  If  pk  <  Lk  +  1  then  by  [31,  lemma  1],  and,  consequently  S,  has  a  null 
space  and  the  argument  in  (b)  applies.  So  assume  pk  =  Lk  +  1.  If  N  < 


Lk  +  1  (and  Lk  ^  0)  then  S ^  has  a  null  space  [76,  lemma  1],  so  assume 
N  >  Lk  +  1.  From  [31,  the  proof  of  theorem  1],  it  is  possible  to  construct 
a  v  independent  from  s ^  such  that  span ( V )  =  span  (S' ^)  for  V  defined 


similarly  as  S^k\  So  for  any  h ^  there  exists  a  h*  such  that  H^v  =  ( J 


V)h ^  =  ( Imxm  <S>  S^)h*.  Therefore  both 


— v 
h* 


and 


—S(k) 


MxM ' 


reside  in 


null( 


( Imxm  <S>  S^)  ),  which  increases  the  nullity  lower  bound  by  at 


least  one. 


(d)  This  is  a  looser  bound  and  hence  required  by  theorem  4.12. 


□ 

Theorem  (CFIM  NLB  sufficiency  conditions,  theorem  4.14).  The  M-channel  K- 
source  FIR  system  FIM  has  a  nullity  of  exactly  the  NLB  in  (4.13)  if 


(a)  H(z)  is  irreducible  and  column-reduced, 


K 

(b)  PtotaX  >  K  +  (K  +  1)  L„ 

3= 1 
K 

(c)  pk  >  Lk  +  1  +  Lj  for  k  =  1, . . . ,  K, 

3= 1 

K 

(d)  N  >  K  +  (K  +  2)  LL  and 

3= 1 


(e)  M  >  K. 


This  result  is  conceptually  easier  to  prove  from  yet  another  alternative  matrix 
model  from  the  ones  in  (4.3)  or  (4.6)  using  S(n)  as  defined  in  (4.9).  Then  define 
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the  (Lfc  +  1  +  n)  x  (n  +  1)  matrix 


'h^(O) 

ht\Lk) 


h£\ 0) 


which  is  the  impulse  response  for  the  ith  subchannel  of  the  kth  source.  Then  define 

K 

H^k\n)  =  H^(n), . . . ,  and  the  (. K(n  +  1)  +  Lk)  x  M(n  +  1)  matrix 


fc=i 


irW(n) 


Jf  (n)  = 


\H^K\n)\ 

The  observations  for  the  mth  channels  (receiver)  can  be  collected  into  the  (N 
(n  +  1)  matrix 


n)  x 


ym(n)  •••  ym{  0) 


Y(m)(n)  — 


[ym(N  -  1)  •  •  •  ym(N  -  1  -  n) \ 

with  Y(n)  =  [Y(i)(n), . . . ,  V(m)(^)]  •  The  noise  matrix  W(n )  is  defined  similarly. 
Then  the  alternative  model  is 


Y{n)  =  S(n)H(n)  +  W(n).  (C.5) 

Before  this  theorem  is  proven,  a  lemma  will  be  needed.  This  lemma  is  a 
generalization  of  a  result  in  [52,  theorem  3].  The  proof  was  originally  shown  in  [49, 
Appendix] . 

Lemma  C.l.  Assume  H(n )  be  full  row  rank  and  h'^  be  any  nontrivial  M(Lk  + 1) 
length  vector,  and  define  H  ^{n*)  to  be  the  Lk  +  l+n*  x  M  (1+n*)  matrix  composed 
from  h'^  as  in  (4.4)  and  (4.5).  Then  the  following  two  statements  are  equivalent: 

(i)  corange {H'W (n* ) }  C  corang p{H(l\n*), . . .  ,H^K\n*)}  =  corange {iT (n*)}. 
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(ii)  h'W  E  range {7-^, . . . ,  H^}. 


We  shall  prove  a  more  general  version  of  the  lemma,  which  is  essentially  an 
extension  of  the  iclentihability  theorem  of  [52,  theorem  3]  from  the  SIMO  to  the 

r  i  t 

MIMO  scenario.  First,  define  h^\l)  =  h^(l), . . . ,  and  the  Mnx  (n  +  Lj) 

matrix 

h^(Lj)  h^(Lj-l)  •••  hW(  0) 

hW(Lj)  h^(Lj-  1)  •••  h&(  0) 

h^(Lj)  hW(Lj- 1)  •••  h,W(  0) 

so  that  H(N)  =  .  . . ,  i/W(iV)]  is  similar  to  Hm  hr  the  original  model 

(4.3),  except  for  the  order  of  the  rows.  Let  L  satisfy  min  {Lj}  <  L  <  max  { Lj } . 

3  3 

Also  define  1~Ll  as  the  M(L  +  1)  x  (L  —  Lj  +  1)+  matrix 

Omxi  Om(l-Lj-i)xi 

Omxi  ■■  Omxi 

Om(L-Lj— l)xl  OAf(L-Lj-l)xl 
where  h{j)  =  [h{j)(0)T, . . . ,  h^){Lj)T]2  ,  an  M(Lj  +  1)  x  1  matrix.  (The  matrix  is 

not  to  scale,  i.e.,  Omxi  does  not  align  with  h^\)  Note,  that  is  null  for  L  <  Lj. 

Now,  we  restate  the  lemma  as  the  following.  Instead,  we  show  the  following  are 

equivalent: 

(1)  Range  {H'(N)}  C  Range  {H^(N), . . . ,  H^(N)}  =  Range  {H(N)}. 

(2)  h'E  Range  .  ,uf]] . 

K 

Proof.  Assume  N  >  Lj  and  H(N  —  1)  is  full-column  rank.  Let  h!  be  any 

3= 1 

M(L  +  1)  x  1  nonzero  complex  vector,  and  define  H'(N)  to  be  the  MN  x  N  +  L 

channel  matrix  composed  from  h! .  We’ll  first  assume  (1),  and  show  (2).  Note  that 

f ,0V An  =  1  *w(0)  P{1)(N  -  1)  1  r  -  1)  0M(«-,)X1  ' 

1  ’  Qm{n-i)xi  H«)(JV-  1)  J  L«0>(JV-1) 
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where 


PU)(N-  1)  =  [h^(l),...,h^(Lj),0MxN-i],  and 

q(j\N-  1)  =  [OmxAt-i,  h.(j)(0), . . . ,  h^\Lj  —  1)]  . 


Note  statement  (1)  implies  the  first  column  of  H'(N )  satisfies 


fc'(o) 

[0m(JV-1)x1 


A'  r 

E 

i=i  L 


^(i)(0)  p^(N-l) 

Om(n-i)xi  HW(N  —  1) 


1 

„  0) 

«o 

_ 

o0) 

.  0 

where  is  a  constant  and  a{p  is  an  N  —  1  +  Lj  x  1  vector.  Hence,  we  have  the 
following  two  linear  systems: 

K 

(Al)  h'( 0)  =  y;  +  p^\N  —  l)ao 

i=i 

K 

(A2)  0M(iv-i)xi  =  y^(j)(^-l)a? 


,0) 

lo 


i=i 


But  H(N  —  1)  is  full-column  rank,  thus  =  0 


N+Lj  —  lxl 


for  all  j.  Therefore, 


K 


h'(0)  =^«lJ,/i(3)(o). 

i= i 

Likewise,  the  next  column  of  H'(N)  is  given  by 


(C.6) 


fc'(l) 
h\ 0) 

0m(AT-2)x1_ 


A'  r 


E 

i=i 


^(i)(0)  p0')(AT-l) 


-| 

r  oil 

_ 

a[j\ 

Om(v-i)xi  H^(N  —  1) 
where,  again,  cx[3>  is  a  constant  and  a[3>  is  an  N  +  Lj  —  1  x  1  vector.  Now,  we  have 


the  following  two  systems: 

I< 

Mi 


(Bl)  ti(  1)  =  y  +p{j\N  -  l)aj: 

3= 1 


0) 


(B2) 


h’(  0) 

®M(N-2)xl 


K 


y  l)a\ 

3= 1 


O') 
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Note  (C.6)  implies  that 


h\  0) 

|_0m(V-2)x1 


K 

E 

1= 1 


a, 


(1) 


0m(AT-2)x1. 


,  i.e.,  it  is  a  linear 


combination  of  column  vectors  of  H (N  —  1).  Since  H(N  —  1)  is  full-column  rank, 


then  (B2)  implies  that  a ^  = 


a, 


(1) 


Ov+Lj— 2x1 
K 

„(1)  i 


.  Thus,  evaluating  (Bl),  we  have 


h'(l)  =  Y  a^h^\ 0)  +  exp  /i^(l). 
l=i 

Continuing  in  this  fashion,  we  arrive  at  the  general  expression 


(C.7) 


(C.8) 


1  =  1  *5=0 


for  0  <  l  <  L.  For  convenience,  we  dehne  h^\i)  to  be  null  if  i  <  L3  or  i  <  0.  Now, 
consider  the  (L  +  2)nd  column  of  H'(N ), 


OmxI 

h'(L ) 
h'(  0) 

0m(V-L-2)x1 


A' 

E 


fr(i)(0)  p0')(7V-l) 

^  |_  Om(jv-i)xi  H^\N  —  1)  J 


1 

'  (!) 

aL+l 

- 

a(i) 

_  L+l_ 

It  follows  that 


K 


(Cl)  0Mxi  =  Ya&hMjO)  +  p(j) (N  —  l)o 
l=i 


(1) 

L+l 


(C2) 


h'(L) 

h'(0) 

0m(AT-A-2)x1 


A' 


=  ^H«>(JV)o“1 
1=1 


Since  H(N  —  1)  is  full-column  rank,  equation  (C.8)  and  (C2)  imply  that  cl'[\x  = 

T 


„,(1)  „,(1)  n 

OiL  ,  •  •  •  ,  «0  )UlxM(AT-L-2) 


Applied  to  (Cl)  we  have 


A' 


OjUxl  —  EAii  ^)(0)+<)^')(l)  +  ---  + 
l=i 


a 


(1) 
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Continuing  in  this  fashion,  we  see  that  for  L  +  1  <  l  <  N  —  1, 

3= 1  i= 0 


(C.9) 


In  particular,  (C.8)  and  (C.9)  show  that  the  IVth  column  of  H'(N)  can  be  written 


as 


0m(V-L-1)x1 

K 

=  £«-«>  (AT). 

-  1)' 

h'(L) 

0) 

3= 1 

«o 

1 

'o' 

*2 

_ 1 

O 

X 

Similarly,  statement  (1)  implies  the  last  column  of  H\N)  satisfies 


(C.10) 


0m(V-1)x1 

K 

-  v 

■  h&(N-  1) 

0m(V-1)x1 

V{l 

h'(L) 

1=1 

gO)(jV  -  1) 

kj 

where  y"j^  is  an  N  +  Lj  +  1  x  1  vector  and  (3^  a  constant.  Thus  we  have 


K 


(al)  Om(jv-i)xi  =  J2HU\N-l)y 
1= 1 


00 

L 


K 


(a2)  h’(L)  =  -  l)3/h  +p'i)h{>\L1 

3= 1 


Here,  since  iT(iV  —  1)  is  full  rank,  then  (al)  implies  that  y^  =  Oat^-ixi,  hence 


K 


h’(L)  = 

l=i 

Proceeding  as  before,  it  is  clear  that 

x  i 

^(Ly-J  +  i) 

i=i  *=o 

for  0  <  l  <  L.  Thus,  the  iVth  column  of  H'(N)  can  also  be  expressed  as 


0m(V-L-1)x1 

K 

=  j2hU>(n)- 

Ov+L,-L-lxll 

h\L) 

"Co 
...  t^c 

h’(0) 

i=i 

L  J 

(C.ll) 
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But  H(N)  is  full-column  rank,  hence  (C.10)  and  (C.ll)  imply  that 


Ji)  (N  -  1)' 

Ov+L.,— L— lxl 

/3f> 

Ji) 

tx0 

1 

... 

o 

So. 

X 

Thus,  a\3)  =  0  for  all  /  >  L  +  1.  Now,  consider  the  following  three  cases: 


(i)  L  =  Lj,  then  J^  =  Jf3  and  =  0  for  1  <  l  <  L. 

(ii)  L  <  Lj,  then  Jf\  —  0  for  0  <  l  <  L. 

(iii)  L  >  Lj,  then  Jf)  =  P^+l  for  0  <  l  <  L  —  Lj  and  J^  ,Ji\  =  0  for  L  —  Lj  +  1  < 

l  <  L. 


Define  an  (L  —  Lj  +  1)+  x  1  vector  a.(j)  =  a{p , . . . ,  Jf!_Lj 
let  ol  be  null.  Then, 

h!  = 

3= 1 

or  h!  e  Range  1 . . . ,  This  proves  statement  (2). 


for  each  j .  If  L  <  Lj, 


K 


Now,  conversely  assume  statement  (2).  Then  h!  =  J]  n{i] 7(i)  where  7^  = 


3=1 


Ji)  Ji) 

11  >  *  •  •  >  l(L-Lj+ 1)  + 


is  an  (L  —  Lj  +  1)+  x  1  vector.  Define 


Tij\N)  = 


Ji)  Ji)  . . .  Jj) 

n  12  l(L-Lj+l)+ 

Ji)  Ji)  . . .  Ji) 

II  12  l(L-Lj+l)+ 


JJ  JJ 


Ji) 

’L-Lj+iy 


Note,  if  L  <  Lj,  then  r^)(iV)  =  On+Ljxn+l-  Then,  it  is  simple  to  verify  that 


K 


H’(N)  =  J2hU)(N)tU\N) 

3=1 


which  proves  statement  (1). 


□ 
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Now  having  this  lemma  at  hand,  theorem  4.14  can  be  proven. 

K 

Proof.  Let  n*  =  Lr  If  (a)-(d)  are  satisfied,  then  we  have  column-rank ( S (n*))  = 
i=i 

row-rank  (i/ (n*)  =  K  +  (K  +  l)n*.  Let  v  e  null(X(i?))  where  v  is  partitioned  as 


'h’W 

S'(L 

h'W 

s'W 


K 

Then  ®  s  ^  =  0,  or  in  the  alternative  model  in  (C.5), 

k= 1 

K  K 

we  have  S{k\n*)H^k\n*)  +  ^  S^k\n*)H^k\n*)  =  0,  or 

k=  1  k=  1 


[5(n*)S'(n*)] 

Therefore  nullity(  [5(n*)5,(n*)] )  >  rank( 


H\n*) 

H(n*) 


0. 


H\n *) 

H(n*) 


).  Since 


nullity(  [5(n*)/S,,(n*)] )  =  columns(  [S,(n*)5,(n*)] )  —  rank(  [5(n*)S,,(n*)] ) 

<  2 K  +  2 (K  +  1  )n*  -  rank (S(n*)) 

=  K  +  (K  +  l)n* 


and 


rank( 


H  (n* 
H(n* 


)  >  rank(i?(n*))  =  K  +  (K  +  1  )n*, 


then  nullity(  [S(n*) S’  (n*)  )  —  rank( 


H  (n* 
H(n* 


)  =  K+(K+l)n* .  Since  rank (H(n*)) 


=  K  +  (K  +  1  )n*  then  there  exist  some  matrix  T  such  that  H' (n*)  =  TH(n*). 


K 


Thus,  by  the  lemma,  for  each  k  then  h for  some  (L^  —  Lj  +  1)+ 


3= 1 


K 


length  vector.  In  the  alternative  model  of  (C.5),  H'^k\n*)  =  y]  p(fc5) h 

3  I 


U)(n*) 
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where 


p(fc,i)  _ 

( k,j ) 

^(Lk— Lj+1)+ 

( k,j ) 

72 

Afc,i) 

(fej) 

^{Lk— Lj+1)  + 

:  Abl) 

7i 

'  (1'fc- •i'j+i)+ 

(fcj) 

72  : 

(fcj)  (fcj) 

7i  72 

n  (fc,j) 

0  7)  ' 

and  this  defines  T  as 

T  = 

'r(i’i)  r^1’2)  rw 

r(2,i)  r^2’2)  ■  ■  •  r(2’/f) 

r(X,l)  p(X,2)  .  .  .  T(K,K) 

J  Lfc+l+n*  xLj+l+n* 


Then  since 


TH{n* 
H(n *) 


=  0 


(, S{n*)T  +  S' (n*))  H{n*)  =  [S(n*)  S'(n*j\ 
and  H(n *)  is  full  row  rank,  it  can  be  seen  that  S  (n*)  =  —S(;n*)T  or  S'^k\n*) 

K 

—S^\n*)T^’k\  which  implies  in  the  vector-matrix  model  in  (4.3)  that  s'^ 

3= 1 

£  Let 


3=1 


7  = 


(1.1) 

(1.2) 

(2,2) 
(2,1) 


7 


(i,*0 


7 


(K,K) 


7 


(A',1) 


Then  u  =  A/7  and  span  (A/")  =  null(X(i?)). 


□ 
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